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PREFACE 


This Second Course continues the development of the theory and applications 
of stochastic processes as promised in the preface of A First Course. We 
emphasize a careful treatment of basic structures in stochastic processes in 
symbiosis with the analysis of natural classes of stochastic processes arising from 
the biological, physical, and social sciences. 

Apart from expanding on the topics treated in the first edition of this work but 
not incorporated in A First Course, this volume presents an extensive intro- 
ductory account of the fundamental concepts and methodology of diffusion 
processes and the closely allied theory of stochastic differential equations and 
stochastic integrals. A multitude of physical, engineering, biological, social, and 
managerial phenomena are either well approximated or reasonably modeled by 
diffusion processes; and modern approaches to diffusion processes and 
stochastic differential equations provide new perspectives and techniques 
impinging on many subfields of pure and applied mathematics, among them 
partial differential equations, dynamical systems, optimal control problems, 
statistical decision procedures, operations research, studies of economic sys- 
tems, population genetics, and ecology models. 

A new chapter discusses the elegant and far-reaching distributional formulas 
now available for a wide variety of functionals (e.g., first-passage time, 
maximum, order statistics, occupation time) of the process of sums? of 
independent random variables. The identities, formulas, and results in this 
chapter have important applications in queueing and renewal theory, for 
statistical decision procedures, and elsewhere. 


x PREFACE 


The logical dependence of the chapters in A Second Course is shown by the 
diagram below (consult also the preface to A First Course on the relationships of 
Chapters 1-9). 
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The book can be coupled to A First Course in several ways, depending on the 
background and interests of the students. The discussion of Markov chains in A 
First Course can be supplemented with parts of the more advanced Chapters 
10-12, and 14. The material on fluctuation theory of sums of independent 
random variables (Chapters 12 and 17), perhaps supplemented by some parts of 
the chapter on queueing processes (Chapter 18), may be attractive and useful to 
students of operations research and statistics. Chapter 16, on compounding 
stochastic processes, is designed as an enticing introduction to a hierarchy of 
relevant models, including models of multiple species population growth, of 
migration and demographic structures, of point processes, and compositions of 
Poisson processes (Levy processes). 

We strongly recommend devoting a semester to diffusion processes (Chapter 
15). The dependence relationships of the seetions of Chapter 15 are diagrammed 
below. Section 1 provides a generalized description of various characterizations 
of diffusion. The examples of Section 2, which need not be absorbed in their 
totality, are intended to hint at the rich diversity of natural models of diffusion 
processes; the emphasis on biological examples reflects the authors’ personal 
interests, but diffusion models abound in other sciences as well. Sections 3-5 
point up the utility and tractability of diffusion process analysis. Section 6 takes 
up the boundary classification of one-dimensional diffusion processes; Section 
7, on the same topic, is more technical. Section 8 provides constructions of 
diffusions with different types of boundary behavior. Sections 9 and 10 treat a 
number of topics motivated by problems of population genetics and statistics. 
The formal (general) theory of Markov processes with emphasis on applications 
to diffusions is elaborated in Sections 11 and 12. Section 13 exhibits the spectral 
representations for several classical diffusion models, which are of some interest 
because of their connections with classical special functions. The key concepts, a 
host of examples, and some methods of stochastic differential equations and 
stochastic integrals are introduced in Sections 14-16. 
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Chapter 15 


As noted in the earlier prefaces, we have drawn freely on the thriving literature 
of applied and theoretical stochastic processes without citing specific articles. A 
few representative books are listed at the end of each chapter and may be 
consulted profitably as a guide to more advanced material. 

We express our gratitude to Stanford University, the Weizmann Institute of 
Science in Israel, and Cornell University for providing a rich intellectual 
environment and facilities indispensable for the writing of this text. The first 
author is grateful for the continuing grant support provided by the National 
Science Foundation and the National Institutes of Health that permitted an 
unencumbered concentration on a number of the concepts of this book and on 
its various drafts. We are also happy to acknowledge our indebtedness to many 
colleagues who have offered constructive criticisms, among them Professor M. 
Taqqu of Cornell, Dr. S. Tavare of the University of Utah, Professors D. 
Iglehart and M. Harrison of Stanford, and Professor J. Kingman of Oxford. 
Finally, we thank our students P. Glynn, E. Cameron, J. Raper, R. Smith, L. 
Tierney, and P. Williams for their assistance in checking the problems, and for 
their helpful reactions to early versions of Chapter 15. 


PREFACE TO A FIRST COURSE 


The purposes, level, and style of this new edition conform to the tenets set 
forth in the original preface. We continue with our tack of developing 
simultaneously theory and applications, intertwined so that they refurbish and 
elucidate each other. 

We have made three main kinds of changes. First, we have enlarged on the 
topics treated in the first edition. Second, we have added many exercises and 
problems at the end of each chapter. Third, and most important, we have 
supplied, in new chapters, broad introductory discussions of several classes of 
stochastic processes not dealt with in the first edition, notably martingales, 
renewal and fluctuation phenomena associated with random sums, stationary 
stochastic processes, and diffusion theory. 

Martingale concepts and methodology have provided a far-reaching ap- 
paratus vital to the analysis of all kinds of functionals of stochastic processes. In 
particular, martingale constructions serve decisively in the investigation of 
stochastic models of diffusion type. Renewal phenomena are almost equally 
important in the engineering and managerial sciences especially with reference 
to examples in reliability, queueing, and inventory systems. We discuss renewal 
theory systematically in an extended chapter. Another new chapter explores the 
theory of stationary processes and its applications to certain classes of 
engineering and econometric problems. Still other new chapters develop the 
structure and use of diffusion processes for describing certain biological and 
physical systems and fluctuation properties of sums of independent random 
variables useful in the analyses of queueing systems and other facets of 
operations research. 
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The logical dependence of chapters is shown by the diagram below. Section 1 
of Chapter 1 can be reviewed without worrying about details. Only Sections 5 
and 7 of Chapter 7 depend on Chapter 6. Only Section 9 of Chapter 9 depends on 
Chapter 5. 
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An easy one-semester course adapted to the junior-senior level could consist 
of Chapter 1, Sections 2 and 3 preceded by a cursory review of Section 1, 
Chapter 2 in its entirety, Chapter 3 excluding Sections 5 and/or 6, and Chapter 4 
excluding Sections 3, 7, and 8. The content of the last part of the course is left to 
the discretion of the lecturer. An option of material from the early sections of 
any or all of Chapters 5-9 would be suitable. 

The problems at the end of each chapter are divided into two groups: the first, 
more or less elementary; the second, more difficult and subtle. 

The scope of the book is quite extensive, and on this account, it has been 
divided into two volumes. We view the first volume as embracing the main 
categories of stochastic processes underlying the theory and most relevant for 
applications. In A Second Course we introduce additional topics and applica- 
tions and delve more deeply into some of the issues of A First Course. We have 
organized the edition to attract a wide spectrum of readers, including theorists 
and practitioners of stochastic analysis pertaining to the mathematical, eng- 
ineering, physical, biological, social, and managerial sciences. 

The second volume of this work, A Second Course in Stochastic Processes, 
will include the following chapters: (10) Algebraic Methods in Markov Chains; 
(11) Ratio Theorems of Transition Probabilities and Applications; (12) Sums of 
Independent Random Variables as a Markov Chain; (13) Order Statistics, 
Poisson Processes, and Applications; (14) Continuous Time Markov Chains; 
(15) Diffusion Processes; (16) Compounding Stochastic Processes; (17) Fluctu- 
ation Theory of Partial Sums of Independent Identically Distributed Random 
Variables; (18) Queueing Processes. 

As noted in the first preface, we have drawn freely on the thriving literaturé of 
applied and theoretical stochastic processes. A few representative references are 
included at the end of each chapter; these may be profitably consulted for more 
advanced material. 
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We express our gratitude to the Weizmann Institute of Science, Stanford 
University, and Cornell University for providing a rich intellectual environment 
and facilities indispensable for the writing of this text. The first author is grateful 
for the continuing grant support provided by the Office of Naval Research that 
permitted an unencumbered concentration on a number of the concepts and 
drafts of this book. We are also happy to acknowledge our indebtedness to many 
colleagues who have offered a variety of constructive criticisms. Among others, 
these include Professors P. Brockwell of La Trobe, J. Kingman of Oxford, D. 
Iglehart and S. Ghurye of Stanford, and K. Itô and S. Stidham, Jr. of Cornell. 
We also thank our students M. Nedzela and C. Macken for their assistance in 
checking the problems and help in reading proofs. 


SAMUEL KARLIN 
HowarpD M. TAYLOR 


PREFACE TO FIRST EDITION 


Stochastic processes concern sequences of events governed by probabilistic 
laws. Many applications of stochastic processes occur in physics, engineering, 
biology, medicine, psychology, and other disciplines, as well as in other branches 
of mathematical analysis. The purpose of this book is to provide an introduction 
to the many specialized treatises on stochastic processes. Specifically, I have 
endeavored to achieve three objectives: (1) to present a systematic introductory 
account of several principal areas in stochastic processes, (2) to attract and 
interest students of pure mathematics in the rich diversity of applications of 
stochastic processes, and (3) to make the student who is more concerned with 
application aware of the relevance and importance of the mathematical subleties 
underlying stochastic processes. 

The examples in this book are drawn mainly from biology and engineering but 
there is an emphasis on stochastic structures that are of mathematical interest or 
of importance in more than one discipline. A number of concepts and problems 
that are currently prominent in probability research are discussed and illustrated. 

Since it is not possible to discuss all aspects of this field in an elementary text, 
some important topics have been omitted, notably stationary stochastic 
processes and martingales. Nor is the book intended in any sense as an 
authoritative work in the areas it does cover. On the contrary, its primary aim is 
simply to bridge the gap between an elementary probability course and the many 
excellent advanced works on stochastic processes. 

Readers of this book are assumed to be familiar with the elementary theory of 
probability as presented in the first half of Feller’s classic Introduction to 
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Probability Theory and Its Applications. In Section 1, Chapter | of my book the 
necessary background material is presented and the terminology and notation of 
the book established. Discussions in small print can be skipped on first reading. 
Excercises are provided at the close of each chapter to help illuminate and 
expand on the theory. 

This book can serve for either a one-semester or a two-semester course, 
depending on the extent of coverage desired. 

In writing this book, I have drawn on the vast literature on stochastic 
processes, Each chapter ends with citations of books that may profitably be 
consulted for further information, including in many cases bibliographical 
listings. 

Iam grateful to Stanford University and to the U.S. Office of Naval Research 
for providing facilities, intellectual stimulation, and financial support for the 
writing of this text. Among my academic colleagues I am grateful to Professor 
K. L. Chung and Professor J. McGregor of Stanford for their constant 
encouragement and helpful comments; to Professor J. Lamperti of Dartmouth, 
Professor J. Kiefer of Cornell, and Professor P. Ney of Wisconsin for offering a 
variety of constructive criticisms; to Dr. A. Feinstein for his detailed checking of 
substantial sections of the manuscript, and to my students P. Milch, B. Singer, 
M. Feldman, and B. Krishnamoorthi for their helpful suggestions and their 
assistance in organizing the exercises. Finally, I am indebted to Gail Lemmond 
and Rosemarie Stampfel for their superb technical typing and all-around 
administrative care. 


SAMUEL KARLIN 
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Chapter 10 


EBRAIC METHODS IN MARKOV 
INS 


Many important results concerning Markov chains can be obtained by using either 
purely algebraic methods or a combination of probabilistic and algebraic techniques. 
We will develop a number of these techniques in the present chapter. In order not to 
disrupt the continuity of presentation, we present here only a brief summary of some 
basic facts of matrix theory needed immediately. A fairly complete discussion of 
these results is given in the Appendix to A First Course. 


1: Preliminaries 


Of fundamental importance in considerations of Markov chains is the computa- 
tion of the n-step transition probabilities. (Special methods are developed in 
Sections 4-6 applicable in the case where the Markov chain is a random walk.) 
To this end, we develop the necessary machinery involving the theory of eigen- 
values and eigenvectors.’ 


(a) Spectral Representation 


Let A be ann x n matrix. A nonzero n-dimensional vector x which satisfies the 
relation Ax = Ax for some number å is called a right eigenvector of A, with 
corresponding eigenvalue À. If xA = Ax, we call x a left eigenvector of A. If there 
exists a complete linearly independent family x‘, ..., x of right (or, alter- 
natively, left) eigenvectors of A, then there exists a linearly independent family 
p”, ..., o™ of right eigenvectors of A and a linearly independent family 
Ww, ..., Y” of left eigenvectors of A which are biorthogonal. This means that 


n 0 if ižj 
D WYD = = = j 
6°, P = F PaP = Oy = 1 f i=j 
k= t =J, 
+The reader unfamiliar with the basic theory of eigenvalues and cigenvectors of matrices 
should consult the Appendix of 4 First Course in Stochastic Processes at this point, 
$ (a, b) denotes the inner produet of the vectors a and b, 
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where ® = (Pits -<s Pin), WP = (Wir, -<-s Win), and Wj, denotes the complex 
conjugate of Yj. In this case, the matrix A is said to be diagonalizable. Let 


P11 0 Pm Wir e Win 
ae See F=]: : i, 
Pin `° Pm Wnt oe Van 
A, O «+ O 
rae O Ag ae: 0 
0 0 +: A 
where 4,,..., A, are the (not necessarily distinct) eigenvalues associated with 


the eigenvectors @,..., @. (Notice that we have not labeled the element of ® 
in the usual order.) Then A possesses a spectral representation as a product of 
three special matrices: 


A= DAY. 


Using the relation (6°, w) = 6,;, we can verify by direct calculation that 
Yo = Y = I (I = the identity matrix). Then A? = PAYDAY = OA?¥ 
and generally 


A” = OA", (1.1) 
where obviously 
a 0o 0 
a Se 0 
0 0 vam 


n 


When A is a Markov matrix, formula (1.1) provides a convenient representation 
of the mth-step transition probability matrix. Its effective use requires deter- 
mining a complete set of left and right eigenvectors. 


(b) Positive Matrices 


Let A be a real matrix which has at least one positive element and no negative 
elements; we write A > Oand call A positive. If every element of A is positive, we 
write A > 0 and call A strictly positive. The following results are known. 

To each A > 0 there corresponds a number r(A) > 0, the spectral radius of 
A, which is zero if and only if A” = 0 for some integer m > 0. In any case there 
are positive vectors f, x > 0, such that Ax = r(A)x, fA = r(A)f. If 4 is any 
eigenvalue of A, then |4] < r(A); if |A] = r(A), then y = A/r(A) is a root of unity, 
ie. n" = 1 for some integer k, and y"r(A) is an cigenvalue of A form = 1,2,.... 
Finally, suppose that A" ẹ 0 for some m > 0; then x and Fare strictly positive 
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vectors and uniquely determined up to a constant factor. Moreover |A} < r(A) 
if A is an eigenvalue of A different from r(A). 


2: Relations of Eigenvalues and Recurrence Classes 


The foregoing results have immediate application to the study of finite-state 
Markov chains. 

Let P = ||P;,\|, i,j = 1,..., n, be a matrix of transition probabilities. Evi- 
dently P > 0. Let x be any vector satisfying yi 1X; = 1. Then 


n n A 
xP = (SP D GPi, $ Pa) 
iol i=1 i=1 


Now 
n 


n n n n 
» (Ps) = a Pee ope kk (2.1) 
j=1 \Vi=1 i=1 j=l i=1 

We claim that xP > 4x cannot hold with x > 0 for any value of A > 1, so that 
r(P) < 1. In fact, summing the components of both sides in xP > 4x as in (2.1) 
yields }¥_, x; > 4 )%_, x;. Since )"_, x; > 0, we can cancel this factor, which 
implies that A < 1. 

On the other hand, the vector (1, ..., 1) is immediately seen to be a right 
eigenvector of P with eigenvalue 1; thus r(P) = 1. 

The property that 1 is an eigenvalue with a corresponding positive left 
eigenvector for any finite Markov matrix can also be deduced from Theorem 1.3 
of Chapter 3.t We know that ina finite state Markov chain at least one state (and 
therefore at least one class) is positive recurrent. Relabeling the states if necessary, 
we may assume that the states i= 1, ..., s form a positive recurrent class. 
Therefore P;; = 0 for any pair i, j for which ie {1,..., s} and je {s + 1,...,n}. 
Thus, P has the form 


P, 0 
Paj 2.2 
and P, forms an s x s Markov matrix. Now the basic limit theorem of Markov 
chains (see Theorem 1.3 of Chapter 3) asserts the existence of 74, . . . , zs such that 


m; > 0, 
S 
a A E fH ings 
imi 


and 


+ Chapters | 9 are in A First Course in Stochastic Processes (Second Edition, 1975). 
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Let x? = (T4, -.., Ty, 0,..., 0); we may verify at once because of the special 
structure of P as displayed in (2.2) that x°P = x°. A slightly more detailed 
analysis yields the following: 


Theorem 2.1. If P is a finite Markov matrix, then the multiplicity of the eigen- 
value 1 is equal to the number of recurrent classes associated with P. 


Proof. We have seen above that if C, is a recurrent class of states, then there 
exists a left eigenvector x) > 0 for the eigenvalue 1 such that x‘) = 0 ifi¢ C4. 
Similarly, with each recurrent class C,, C3, ... there is associated a positive 
eigenvector x), x, ..., with eigenvalue 1 such that x!” = 0 if i¢ C,. Since 
distinct classes are disjoint, it is clear that x‘, x, ... are linearly independent 
vectors, and so the multiplicity of the eigenvalue 1 is at least the number of 
distinct recurrent classes. To prove the reverse inequality suppose that xP = x. 
Then xP” = x for m = 1, 2,..., 1. 


n 
È xiPh = xj, j=1,...,n, mS 2 


i=1 
But if j is a transient state, we know that limn- PP = 0 for all i. It follows that 
x; = 0 for every transient state j, and so we can write 
È VaPy=xy Fe UC, 
h=1 


h=1ieCn 


where C,,..., C, are the recurrent classes. Also P;; = 0 if i and j are in distinct 
recurrent classes; therefore, we have 


>) x: Py = x; for JECh h= 1,...,r. 


ieCnrn 
If x; 4 0 for some i€ C,, then by Theorem 1.3 of Chapter 3 there exists a 
constant a, such that 
Xi = a,x”, iE Cy. 


Thus 


r 
x= J ax: 
h=1 


from which we see that the x form a basis for the manifold of left eigenvectors 
with eigenvalue 1. E 


PROBABILISTIC INTERPRETATION OF EIGENVALUES AND EIGENVECTORS 
Let us now consider the manifold of right eigenvectors of P with cigenvalue 1. 


It turns out that there is a basis for this manifold which has a very simple prob- 
abilistic interpretation, In fact, if Ci... C, are the recurrent classes associated 
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with P, we define p{” to be the probability that starting from i the state of the 
system will eventually lie in C}, i.e., 


pP = Pr{X, E C, for some n = 1, 2,...|Xo = i}. 


Clearly 
pi = 1 for iE€C,, (2.3) 
? 0 for i€Cj, j#h, 
since it is not possible to leave a recurrent class. If we define p® = (p™,..., p®), 
h =1,...,r, the preceding equations show at once that the vectors p™,... , p” 


are linearly independent. Furthermore, the p‘” satisfy the equations 


r 


pP = $ Pyp”, i=1l,....n, h=1,...,r, 
j=1 

[Eq. (3.4) of Chapter 3], which shows that p”, ... , p” are right eigenvectors of 

P with eigenvalue 1. As the p® are linearly independent and their number r is 

the multiplicity of the eigenvalue 1, they form a basis for the right eigenmanifold 

corresponding to the eigenvalue 1. Finally, we observe by direct evaluation with 

the aid of (2.3) that , 


-a t ae 
© pD) — y 
oP) 10 if iFj, 


since the only nonzero components of x® are those whose indices are in C,, 
and their sum is just 1. 
Let us assume now that P has a spectral representation, and that the eigen- 


values A,,/,,..., A, are labeled so that 1 =A, =---=4,>]4,4,| = [4,42] 
>- and A.,, # 1. Then we can take p = p™, ..., pP = p and py = 
x), 2, YO = x” (see Appendix to A First Course). From 

P” = ®A"Y 
we obtain 


n n 
Pj = X OriAhWnj = PiWi too + OW + DA Pri An nj: 
h=1 h=r+1 
Suppose that P has no eigenvalue, different from 1, whose modulus equals 1; 
then |A| < l,h =r+1,...,n, and asm -> œ, 


D Phi An Waj >0 


h=r+1 


and the rate of convergence is of the order at least |/,, , |”. We shall see shortly 
that |A| < 1,h=r + l,...,n, ifand only if P has no periodic recurrent classes 
(Theorem 3.1 below), Assuming that P has no periodic recurrent classes and 


recalling that x = Prj h =1,...,7,7 =1,...,n, is different from zero if and 
only if j € Ca, we see that 


QW; = PiP) == PriW rj =0 for j transient. 


Thus, if j is transient, Př} = Fiap 1 Oni AW j and this tends to zero at the rate 
[A,41/"asm — co. Now if i,j € Ca, then among the first r terms in the expression 
for P” the only nonvanishing one is @p;W/,;; but Pui = 1 (recall that Pn; = pf”) 
and W,; = 7; = lim, o Pij. We see generally for all states j that x; — P}; goes 
to zero at least as fast as |A,,,|" as m > œ. 

Now let us assume that in addition to |A,,,| < 1 we have the special situa- 
tion that |A,.2| < |A,.,|. Let, as usual, T denote the set of all transient states, 
i, je T; we wish to find the following limit: 


lim Pr{X,, = j|Xo = i, XmET}, 

m> oo 
i.e., the limiting value (m —> œ) of the probability that starting from state i the 
process is in the transient state j, given that at time m, X „is in a transient state. 
We have 


p” 
Pr{Xm =jlXo = i XmET} = 
dier Ph 
As we have seen before, for j transient P? = Yh-r+1 Oni ^Yn 
Since |A,,,| > |4,42|, we readily find that 
lima Pij = Pr+t,iWr+t, ij _ Wri 
m> oo ye P Sjer Prti, iDrtij Dier Prti 


assuming that the denominator does not vanish. If the denominator does 
vanish, we have to examine the terms in )}-,41 Pnr Yn; containing /,,. and 
other eigenvalues whose modulus equals |A,..|, and so forth. 


3: Periodic Classes 


We wish to give a more complete description of the structure of a periodic chain. 
The simplest class with period d is clearly one in which there are d states 1,...,d 
and 


O 1 0 0 

0 0 1 0 
Pia = Paa = = Pye a = Par = 1, P= : 

1 0 0+ 0 


A less trivial example may be formed by replacing the individual states L... d 


1 


oe aot Lamifes Coo... Cof states and defining the P, in such a way that 
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P;; # Oonly ifie Cy, je Cz, 0rie Cz, jEC3,...,orie Cy, je C. The matrix P 
then takes the form 


0 P, 0 
Bele ee 
P 0 0 


At the same time, we can define P;; so that every two states communicate. 
We will now prove that every periodic class is of this form. Let d be the period of 
the class W and assume that the states are labeled by 1, 2,..., M. Let C, consist 
of the states of W which can be reached from state 1 in some multiple of d 
transitions; i.e., j€ C4 if and only if P3% > 0 for some integer n > 0. For each 
r= 1,...,d — 1, we define C, +; to consist of those states which can be reached 
from state 1 in r plus some multiple of d transitions; i.e., j€ C,+1 if and only if 
P'4*" > 0 for some integer n > 0. 

First we show that if je C, then P% > 0 implies h = md for some m > 0. 
In fact, since je C, implies that P14 > 0 for some n > 0, it follows that P34+* > 
P? P'S > 0 (cf. Chapter 2, Theorem 3.1), and so by the definition of period 
nd + h must be divisible by d; hence h is. Next we show that if ie C,, jE Cp+1 
then P}, > 0 implies that h = nd + r for some n > 0. In fact, let P§; > 0 for 
some s > 0; P% > 0 for some q > 0; and PT?*" > 0 for some m > 0. Thus, if 
w=s+dq+md+r then PY, > PT$" PS P3 > 0, and w is a multiple of d; 
therefore s + r also is a multiple of d. But P#'* > P5,P!; > 0, so that h + s is 
divisible by d. Combining these two results, we infer that h — r is divisible by d, 
and therefore, h = nd + r for some n > 0. 

We leave it to the reader to verify that the above results imply that C,,...,C, 
are disjoint and nonempty, that |_J?_, C; = W, and that ie C, requires P,; = 0 
for every jé C,+ı where Ca+1 = Cy. 

Having thus analyzed the matrix of a periodic class, we can now demonstrate 
an earlier assertion concerning the occurrence of eigenvalues of modulus 1 of a 
Markov transition matrix. 


Theorem 3.1. If P is the transition matrix of a finite irreducible periodic Markov 
chain with period d, then the dth roots of unity are eigenvalues of P, each with 
multiplicity 1, and there are no other eigenvalues of modulus 1. 


Proof. Let D,,..., Dg be the “moving classes” of the process as established 
above, i.e., i € D, implies P;; = 0 for every j ¢ D,,,. It is no loss of generality to 
assume that Di = {1,..., ni}, D2 = {n, + 1,..., n, +n}, ..., Da = {M — 
ny + 1,...,M}. From the definition of the moving class it follows that 


A, 0 0 
0 A, > 0 


pi = 
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where A; is ann; x n; Markov matrix. Furthermore, for each i, A” > 0 for some 
integer m > 0 (see Problem 5, Chapter 2). Thus, A; has a strictly positive left 
eigenvector p®, with eigenvalue 1, of algebraic multiplicity 1. Owing to the form 
of Pô, it is clear that, by adjoining an appropriate number of zeros on one or 
both sides of each p®, we determine linearly independent vectors x, ..., x 
such that 


x = xOp4 i=1,...,d. 


Let us consider the vectors y® = x, y” = xP, ..., y¥ = x p4—! Since 
the only nonzero components of x!) are those with indices 1, 2,..., n,, and 
observing that P® may differ from zero only if the moving class in which i lies 
agrees with that of j after precisely h steps, we see that the only nonzero com- 
ponents of y® are those with indices (ny +--+» + ni-1 +1,...,n, +0 +n). 
This implies that the vectors y (i = 1,...,d) are linearly independent. Further- 


more 
yP’ = x()pi- ipa = x(pdpi-1 = xPP- t = yë. 


It follows that if we restrict attention to the n;-dimensional linear space obtained 
by considering only those components of y® whose indices lie in D;, we obtain a 
left eigenvector with eigenvalue 1 for Aj. 

Because the eigenvalue 1 has simple multiplicity for A,, it follows that each 
y® is a constant multiple of x. Actually, if we normalize each x, ..., x by 
the condition )"_, x® = 1,h = 1,...,d, then, in fact, y = x, h = 1,...,d. 
Accordingly, we may write x?) = xP, x® = x®P,...,x® = x@p. 

Let œ = e?™/4, Combining the above equations in the indicated manner, 
we obtain 


(x) + x2) + x8) Eens x®)P 
= x) + x2) pl era ae x, 
(x + wx) + wx?) +--+ wt 1x)p 
= wT (x) + wx +- + wf 1x), 
(x) + wx? + wtx® +--+ w? Dx@)p 
2e wo 2(x) + wx? an w? Dx), 


(xO) 4 oC DXO 4 CDK 4... 4 C PDP 
= oT DD DX p n 4 yd PKO), 


The lincar independence of the x ensures that none of the vectors appearing 
above are zero. These relations exhibit the property that the dth roots of unity 
are all eigenvalues of P. 
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Suppose next that xP = Ax for some nonzero x. Then xP? = 4x. Looking at 
the contracted vectors 2{ = (x1, ..., Xm) Z? = (Xm -0> Xmptng vee 
Z® = (Xyg—ngtis+-+s Xm)» we see that 


2A, = 142, i= 1,...,d. 
Since at least one of the z® is nonzero, and for each A; there is an m such that 


A” >0, either 47 = 1 or |A4| < 1. If 4f = 1, then there are constants c,,..., Ca 
such that 


2) = ¢,x®, i=1,...,d, 
and so we see that x = c,x") +- + cax®. 
Now 
Ax = xP = cx + cx 4-0. + ax” 
or 
deax D pee H Acgx = cgx + yx 4 00+ + cg_ 1x. 
Since the x are linearly independent, we have 
Ac, = Ca, Aca = C1, ees ACa = Ca-1s 
or 
cya Sey OY a. ea Se MG Sa OI Ae. ens 
cy = Atte, = A Neg 
since Af = 1, and this means that x is plainly a constant multiple of one of the 


eigenvectors of P already constructed. I 


The case of an arbitrary Markov matrix P follows easily from the preceding 
theorem. We have 


Theorem 3.2. If P is a finite Markov matrix, then any eigenvalue of P of modulus 
| is a root of unity. The dth roots of unity are eigenvalues of P if and only if P has 
a recurrent class with period d. The multiplicity of each dth root of unity is just 
the number of recurrent classes of period d. 


The proof is essentially identical with that of Theorem 3.1. Since Ax = xP 
implies 
A"x = xP" 
or 


n 
a"x; = X Xi Pij, 


i=] 
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then, letting m — oo, we see that x; = Oif j is transient. We may therefore restrict 
attention to the recurrent states, and the theorem immediately reduces to the 
case considered in the previous theorem. 


4: Special Computational Methods in Markov Chains 


Let P be the transition probability matrix of a random walk on the nonnegative 
integers with probability 4 of going to each of the two neighboring states from 
state k (k > 1) and with a reflecting barrier at the origin; that is, 


0 1 0 0 

4040 
P = 

040343 


To obtain the probability of reaching state l from state k in n steps we could 
multiply matrix P by itself n times and seek out the element P% in the kth row 
and Ith column of the matrix P”. This method, however, is very cumbersome and 
lengthy in practice. 

A second approach is to attempt to generalize the method of eigenvalues and 
eigenvectors as developed in Section 2. In the case of infinite matrices this 
cannot always be done. However, for matrices of the same form as P above or, 
more generally, transition probability matrices corresponding to random walks, 
there is available an infinite analog to the representation formula (1.1). 

We proceed to obtain P}, in a manner which will illustrate a general method 
applicable to arbitrary random walks. 

Adding the two trigonometric identities 


cos(x + f) = cos « cos f F sin a sin f 
leads to the identity 
cos a cos B = + cos(a + f) + 4 cos(a — P). (*) 
Let « = 0 and fp = k0 (k = 1, 2,...). We get 
cos 6 cos kð = $ cos(k + 1)0 + 4 cos(k — 1)0. (4.1) 
Since the elements in the kth row of matrix P are 
P42 Pirr Sah Pegs HO) Pig =o Page =, 
Pairi =i Praia = Oe, km 2,3,..., 
Pio = 4, Py, =0, PiS h, Pia = 0,..., 
Pow =O Pois 1, Pog = 0a, 
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Equation (4.1) can be written as 
cos 0 cos kð = }'P,,cosré, k=0,1,.... (4.2) 
r=0 


Next, multiply this equation by cos 6. Then 


cos? 0 cos kð = > P, cos 6 cos rô. (4.3) 
r=0 


But by (4.2) cos 0 cos r0 can be expressed as 


oO 


cos 0 cos rð = > P, cos s0. 
s=0 


Substituting this in (4.3), we have 


cos? 0 cos k0 = 


r 


Ms 


0 


(Ps È P, cos s) 
s=0 


ll 
Ms 


foo} 
cos sO) Pi, Pr) 


0 r=0 


a 
I 


P? cos s0. 


I 
Ms 


s=0 


Although it simplifies the notation to take all summations from 0 to œ, note 
that all but a finite number of terms are equal to zero. 

After n — 1 iterations of the procedure of multiplying by cos 6 and then 
interchanging the order of summation we finally obtain 


cos" 0 cos kð = )° Pi, cos rô. (4.4) 
r=0 


Multiply both sides of this equation by cos s@ and integrate with respect to 0 
from 0 to 22: 


2n 2% œ 
Í cos" 0 cos k0 cos sô d0 = > Piz, cos r8 cos s0 d0 
0 0 r=0 
2n 
0 


=) Pi, Í cos r8 cos sO dé. (4.5) 
r=0 


Using the identity (*) with æ = r0 and $ = s0, it is simple to show that 


ae [e for rÆs, 
i cos rO cos s0 d0 = \ 1 for r=s>1, (4.6) 
i lon for res = 0. 
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From (4.6) and (4.5) we immediately obtain 


1 2n 
= Í cos” 0 cos k0 cos s0 dé, s Æ 0, 
a 0 
ks = 1 27 (4.7) 
= Í cos” 0 cos k0 dé, s=0. 
2r Jo 


This integral can be computed without difficulty for given n, k, and s. 

The general method, of which the preceding technique constitutes a particu- 
larly simple example, is the following. 

Suppose we are given a random walk process on the nonnegative integers 
whose matrix P of one-step transition probabilities is given by 


ro Po 0 0 
P= qı Fi Pı 0 (4.8) 


0 qd rn P2 


where qa + fan + Pah = 1, qn > 0, Pn > 9, r, È O, for n = 1, 2, ..., and ro + 
Po = 1, Po > 0, ro = 0. (Note for future reference, however, that none of the 
following general results are dependent upon the conditions qa + r, + Pa = 1, 
n = 1,2,..., and ro + po = 1.) Let us consider the following system of equa- 


tions: 
XOKLX) = Qk-1(X) + 1 OX) + PrQk+ 1%), k=1,2,..., (49) 


with the “initial” specifications Q(x) = 1 and Q,(x) = (x — ro)/Po. Since 
Pn > O for all n = 0, 1, 2,..., it is clear that Q,(x), n > 2, are determined re- 
cursively from (4.9) and that Q,(x) is a polynomial in x of exact degree n. Now 
a theorem whose proof is beyond the scope of this book asserts that there exists 
a function o(x) on the interval [—1, 1] that is nondecreasing and not identically 
constant, such that 


=0 if k#s, 


so if kas ETObA G0 


f” aoo dota} 


We express the property (4.10) by the statement that the functions Q,(x), 
k =0,1,..., are “orthogonal! polynomials with respect to the distribution o(x) 
over the interval [ — 1, 1].” The function o(x) is unique up to an additive constant. 
This general theorem enables us to derive an explicit expression for the Py,. In 
fact, Eq. (4.9), in view of the prescriptions for Qo(x) and Q,(x), may be written 


a0 


XO) = YP Q(, k= Ob, (4.11) 


r=0 
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Multiplying both sides by x and substituting (4.11) into the right side of the 
resulting equation, we have 


x?Q,(x) a X Py ¥ P, Q(x) E >; POX), k = 0, 1, e. (4.12) 
r=0 s=0 s=0 
Proceeding in this fashion, we obtain 
x"QO(x) = } POX), k=0,1,..., n=1,2,.... (4.13) 
r=0 


Multiplying both sides by Q(x) and integrating over [—1, 1] with respect to 
do(x), we find, by virtue of the orthogonality relations (4.10), that 


f x%0.090,00 doce) = È Pi f 2,090,60) doc) 


1 
= Pt, Í 02x) do(x). 
-1 
Thus 


» JE x"Q)Q(x) do(x) 
ss fE Q2(x) dox) 


which is the desired formula. 

As remarked earlier, it should be observed that this procedure bears a 
similarity to the diagonalization method of Section 1. In fact, equations (4.9) 
assert simply that for each value of x the infinite vector (Qo(x), Q,(x), ...) is a 
formal eigenvector of the matrix (4.8) for the eigenvalue x. Since there is a 
continuum of eigenvalues, it is reasonable to expect that a discrete sum analogous 
to that obtained in Sections 1 and 2 for P}, is in general impossible. Actually, it 
turns out that the appropriate generalization of the spectral representation (1.1) 
is (4.14). The underlying mathematical phenomenon involves the existence of a 
“continuous spectrum” in addition to the (possibly empty) discrete spectrum, 
which is generally the case when one is dealing with infinite matrices. The precise 
mathematical elaboration of these ideas is beyond the level of this book. We 
will, nevertheless, illustrate this theory with the discussion of some additional 
examples. 

It might appear that the method, elegant though it may be in theory, is of 
little value in practice. To find P%,, it is necessary to determine the polynomials 
(O,(x)}-09 and further to obtain the distribution o(x), concerning which we 
have so far asserted nothing more than its existence. The actual situation, 
however, is far better than this pessimistic evaluation. First of all, a great deal is 
known about general orthogonal polynomials, from which one can deduce 
important theoretical results concerning the behavior of the P{,, and in particu- 
lar, ratios of them, as n => %, 


(4.14) 
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Second, a random walk which arises in a concrete problem will in all 
likelihood have a transition probability matrix which is far more regular than 
the general form (4.8). For example, one might have po = py = pp =`, 
qı = q2 = q3 = +++, Or else Py = Pats =` tto On = Inti = +++ for some integer 
n. In these cases, as well as others, it can be shown that the polynomials Q,(x) 
are combinations of various classical polynomial systems: which have been 
studied extensively. 


5: Examples 


(a) The Symmetric Random Walk with Reflecting Barrier 


In order to bring the computations of Section 4 into the general form involving 
orthogonal polynomials, we have only to put 


Q(x) = cos k (arccos x), k=0,1,.... 
The Q,(x) are orthogonal over the interval [— 1, 1] with respect to the distribu- 
tion do(x) = p(x) dx, where p(x) = (1/m)(1 — x?) "7, since 


1 T 
| Q(x) p(x) dx = C i cos k0 cos 16 d0 = 0 if kl, 
-1 0 


as the change of variable x = cos 0 shows. 


(b) Another Random Walk with Reflecting Barrier 


As a further example, consider the random walk on the nonnegative integers 
whose transition probability matrix is 

0 10 0 0 

p- |4 0 p 0 0 

0 q 0p 0 


withq¢,p>0,qgt+p=1. 
Multiplying both sides of relation (4.1) by 2,/pq(,/q/p)* we obtain 


2./pq cos 0(./q/p)* cos kô = paip}! cos(k + 1)0 
+ q4 /q/p}' cos(k — 1)0, k=1,2,... 


Thus the polynomials 


Q(x) = (Valp! cos k0, 2/pa cos () = x, k= Q,1,2,... 
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satisfy the system of equations (4.9) corresponding to the matrix P above except 
for k = 0. Here Q(x) = 1 and Q,(x) = x/2p, whereas we want the initial 
conditions Qo(x) = 1, Q,(x) = x. 

To remedy this problem, we start with the identity 


cos 0 sin(k + 1)0 = 4 sin k0 + 4 sin(k + 2)0, k=0,1,2,.... (5.2) 


Multiplying both sides by 2,/ pq/a/p)* and dividing by sin 6, we convert 
(5.2) into 


2,/pq(cos apy ZSA sin(k + 1)0 


sin 0 
poy 8 p+1 Sin(k + 2)0 E 
= q./4/P DM + ayap are ae k =1,2,.... 
Let 
” , sink + 16 1)0 _ 
Z0) = (Vap) —— a> k=0,1,.... 
Then 
sin 20 
Z,(0) = 1 and Z,(0) = ./q/p — y 
sin 0 
while 
2./ pq(cos 0)Z,(0) = qZ,_ (0) + pZ,+ (0), KS Qe os. 5 
Let 


R(x) = Z,(0); x = 2./pq cos 0. 
Note that Ro(x) = 1 and R,(x) = x/p, while 
XRi(X) = GRy- 1%) + PRe+ 1), k=1,2,.... 


It also follows that R,(x) is a polynomial of degree k. Finally, let P,(x) = 
(2p — 1)R,(x) + (2 — 2p)Q,(x), k = 0, 1,.... Then Po(x) = 1, Py(x) = x, and 
moreover 


XP,(x) = qPy—1(x) + pPx+1(x), k= hZ 


since both R,(x) and Q,(x) satisfy the same relations. Thus the P,(x) are the 
polynomials corresponding to the transition matrix P. 

The detailed procedure for obtaining the distribution o(x) with respect to 
which the P,(x) are orthogonal on [—1, 1] is beyond the scope of this book. 
Therefore, we will simply present it, leaving to the reader the verification that it 
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enjoys the desired properties. If p > 3, then o(x) is constant outside the interval 


[= ./4pq , ./4pq], and in that interval 
/4ng — x? 
do(x) = Cy 4pq — x” dx. 


1 — x? 
If p > 4, then o(x) has, in addition to the “density” specified above for the 
interval [—./4pq, ./4pq], two “jumps” of magnitude (1 — 2p)/q at the points 
—1 and +1. 


The constant C serves as a normalizing factor to guarantee that fe , do(x) 
= 1. 


(c) Random Walk with Absorbing Barrier 


Next we discuss the problem of random walks on the integers — 1, 0, 1, 2, 3,... 
with probability 5 of going from a state k > 0 to each of its two neighboring 
states and with an absorbing barrier at state — 1; that is, the transition prob- 
ability matrix is 


States: 

-1 01 2 
—1 } 100 0 
ol} 040 
P= 2 2 
1}o404 


Although P is not of the form considered in the general method, we can still 
follow a procedure analogous to that used in Section 4. 
The key to our analysis is the identity 


cos 0 sin(k + 1)0 = 4 sin kO + 4 sin(k + 29, k=0,1,2,.... (5.3) 
Since the kth row of P consists of elements 
Py,-1 = 9, Pko =0,..., Prr-1 =? Pix = 9, Pret = 3, 
Pitta = 9,..., k=1,2,..., 
Pozi =o Pog = 0; Poi = oe Pia = Oss 


the relation can be written, for k = 0, 1,..., as 


~ 


cos 0 sin(k + 1)0 = È P,, sin(r + 1)0. 


r=-1 


Multiplying this by cos 0 and substituting 


cos O singe + DO = YP, sin(s + 1)0 


veel 
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into the right side of the resulting equation yields 


cos? 0 sin(k + 1)90 = > Pe > P, sin(s + 10 
1 s=-1 


r=— 


ioe} 


È sin(s + 10 È Py Pys 
r=-1 


s=-1 


$ PZ, sin(s + 16. 


s=-1 


Repeating this procedure n — 1 times leads to 
cos” 0 sin(k + 1)0= } P}, sin(r + 16. 
r=-1 


Now multiply both sides of this equation by sin(s + 1)@ and integrate with 
respect to 0 over [0, 27]: 


2n i 
Í cos” 0 sin(k + 1)0 sin(s + 1)0 dé 
0 ; 


2n œ 
5 Pi, sin(r + 1)0 sin(s + 1)0 d0 


0 r=-1 


eo) 2n 
y P Í sin(r + 1)0 sin(s + 1)9 dð (s=0,1,...). (5.4) 
0 


r=-1 
Using elementary trigonometric identities, it is easily shown that 


if rÆs 


ff pag (5= 912.) (55) 


2n 
Í sin(r + 1)0 sin(s + 1)0 d0 = p 
0 


I follows from (5.4) and (5.5) that we can express the n-step transition prob- 
abilities by the formula 


2n 
"n = ; Í cos” 0 sin(k + 1)@ sin(s + 1)6 dé, 
T Jo 


k,s = 0, 1,2,..., n=0,1,.... (5.6) 


l'his is just the general method as applied to the matrix 


oN © 
s- o N= 
O nN- © 
Nr oO © 
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obtained by deleting the first row and column from 


1000 
4040 
P= |? 2 
0203 


The validity of its use in computing the transition probabilities Pj, for k, 
s = 0, 1,...is based on the observation that in computing P%, it is not necessary 
to consider any path which leads to state — 1, since such a path cannot leave this 
state. The orthogonal polynomials are 


oy EE? eo 
sin 0 
where x = cos 0, and it is easy to check that the functions Q,(x) are orthogonal 
with respect to do(x) = (2m) 1(1 — x?)*"/? dx over [—1, 1]. 

As an application of the preceding result let us compute the probability that 
starting from state k absorption into state — 1 occurs exactly at the nth transition. 
Absorption into state — 1 can obviously occur at the nth step only if the process 
isin state Oat the (n — 1)th step and then absorption occurs at the next step. But 
the probability of being in state 0 at the (n — 1)th step, having started from 


state k, is, by (5.6), 


1 2n 
ho Ss Í cos”! @ sin(k + 1)@ sin 6 dé, 
T Jo 
while Py _, = 3. Hence the probability of absorption into state — 1 at time n 
starting from state k is 
1 


2n 
At = — Í cos”™! @ sin(k + 1)0 sin 6 dé. (5.7) 
27 Jo 


6: Applications to Coin Tossing 


The random walks discussed above are related to coin-tossing problems. 
Suppose that two gamblers agree to carry out a series of coin tossings with a 
fair coin. They agree that each time the coin shows heads gambler I wins one unit 
from gambler II; otherwise he loses one unit to gambler II. Let 


x= +1 if gambler I wins, 
i )=1 if gambler I loses, 
at the ith toss of the coin, Then Pr{X, = +1} = Pr{X, = —1} = 4 and 


S, = Dye, Xi (n = 1) is the net gain of gambler I after n tosses of the coin. 
Further, set Sy = 0. One of the simplest questions that can be asked concerning 
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this contest is the following: What is the probability that after n tosses of the coin 
gambler Ps net gain will be zero? Clearly gambler Ps net gain cannot be zero ifn 
is odd; hence we consider only the case n = 2m. Now if I’s net gain is zero in 2m 
trials, evidently he won in m trials and lost in m trials. The desired probability is 
clearly 


Next, what is the probability that the net gain of gambler I will equal zero for 
the first time after n = 2m tosses of the coin? Clearly S, describes a symmetric 
random walk process on all the integers. Therefore our question can be formu- 
lated this way: What is the probability foo of a first return to zero occurring at 
the nth step? Starting from zero the first step can be + 1 or — 1 with probability 4 
for either choice. Because of the obvious symmetry about the origin the prob- 
ability of first return to zero from +1 must equal that from —1 and so our 
question will be answered if we find the probability of first passage to zero from 
state +1in 2m — 1 steps. But this must equal the probability of first passage to 
state —1 starting from state 0 in 2m — 1-steps, because of the homogeneous 
nature of the process. This equals the probability A43”~ ' of absorption into state 
—1lin2m — 1 steps starting from state 0in a random walk process on the integers 
(—1, 0, 1, 2,...} with absorbing barrier at state — 1, which is given by formula 
(5.7) with k = 0 and n = 2m — 1: 


2n 1% 
Ag") = =| cos?"~? @ sin? 0 d0 = — i cos?"~? @ sin? 6 dé. 
T 


0 0 


To evaluate the latter integral make the substitution x = cos 0. Then 


1 
Aan = =| x2m— 24 a x?) dx 
=1 


21 f 2\m-3/2 251/2 
=I (x*) (1 — x*)'/*2x dx. 
0 
Next substitute t = x?. Then 
1 fr 1 
Agm 1 = Í e321 — N"? dt = — Bim — 4,9), (6.1) 
T Jo T 


where 


1 
B(a, B) = Í 1 — Af! dt 
0 
ts the beta function, which can be expressed in terms of the gamma function: 
FCDA) 


a ey) 
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Then 


1 Tm - HFG) _ 1m — Hm — 9-3 AGATE) 


2m-1 _ ees 
Ao x T(m+1) T m(m — 1)---2-1 


by well-known properties of the gamma function. Since rG) = Jn we find that 
the probability that the total gain of gambler I will equal zero for the first time 
after 2m tosses of the coin is given by 


4 for m=1, 
-1 


2m _ m — 32m — 5) -3 
foro = 2™-m! 


A straightforward computation yields the following interesting result: 


for m>2. (6.2) 


2m _ = 
Soro = Hom-2 — Hm» m=1,2,..., 


where we define py = 1. 


oc co 
` foio = 2 (Uak-2 — Hak) = Hom — liM Han 
k=m+1 


k=m+1 n> co 


; 2 
= Uzm — lim za = Mm. (6.3) 


n> 


Interpreting the two ends of (6.3), we have 
Uam = Pr{Som = 0} EE Pr{S, Æ 0, S2 f 0, reer Som Æ 0}. (6.4) 


Next we want to answer the question, What is the probability that gambler I 
will have a net gain zero for the kth time after 2m tosses of the coin (k > 1)? In 
terms of the random walk described by the process S,, this question can be formu- 
lated in the following way: What is the probability that the kth return to the 
origin will occur at the 2mth step? Again our first step from state zero can be 
taken to state +1 and we can similarly assume that after each of the first k — 1 
returns to the origin we always make our next step to state + 1. Now, since our 
random walk is homogeneous, we can freely interchange intermediate steps 
without affecting the probability of reaching one state from another. This way 
we can “take initially” our first step “to the right” (to state +1), and all steps 
to state +1 after each of the first k — 1 returns to the origin, thus reaching state 
k in the first k steps. Then the required probability will be the probability of 
reaching state zero for the first time in 2m — k steps from state k or reaching state 
—1 for the first time in 2m — k steps from state k — 1. The latter probability, 
however, is that of absorption into state — 1 at the (2m — k)th step starting from 
state k — 1, in the random walk process on the integers {—1, 0, 1, 2,...} with 
absorbing barrier at state — 1, and is given by formula (5.7) with n = 2m — k 
and k — 1 in place of k: 


l an s : 
Ape = on f cos?" 1 10 sin k0 sin 0 d0. (6.5) 
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The value of this integral is 


1 2m — k k 
2m—-k _ 
AKTI = Jamk ( m ) 2m — k` Cc 
(For a validation of (6.6) the reader may consult Problems 6 and 7.) 
As a further question, we consider the sequence S4, S2, . . . , S2m and ask for the 


probability that exactly k members of this sequence will equal zero. This will be 
the probability z, 2m that in 2m tosses of the coin gambler Ps net gain will be zero 
exactly k times. 

By (6.4) Zo, 2m = Ham. Now we evaluate Z; 2m. Let B, be the event that among 
Sis <- -s S2m Only S2, vanishes. Then for r < m 


B, = {S; #0,..., S2,-1 # 0, Sa, = 0, S2p41 #0,..., Sam # O} 
= {S, #0,...,S2,-1 # 0, Sa = 0} 
NA {S241 — Sar #0,...,S2m — So, # Of. 


Now clearly the events {S, #0, ..., S2,-, #0, S2, = 0} and {S,,,, — So, 
#0, ..., Som — So, # 0} are independent, and the probability of the latter is 
just Lom—2,- Thus 


Pr{B,} = f$! o Zo, 2m- 2r l<r<m, 


where Zo o = 1 by definition and so 


Z1,2m 7 È Pr{B,} = » J oZo,2m-2r- (6.7) 
r=1 r=1 


But Hom—2, is the probability of gambler Ps net gain being zero after 2m — 2r 
coin tossings and, as pointed out before, is equal to Pr{S,,, — S2, = 0}. But 
the events {S, #0, ..., S,,-; #0, Sz, = 0} and {S,,, — S2, = 0} are also 
independent, and so we obtain finally 


Pr{B,} = Pr{S, # 0, a) So,-14 # 0, So, = 0, Som 7 So, = 0}. 


‘The events on the right-hand side are mutually exclusive and their union is the 
event {Sz2m = 0}; hence 


X Pr{B,} = Pr{Son = 0} = Hm: 
r=1 


THUS Z1,2m = Ham = Zo, 2m for m > 1. In the same way we can show that 


m—-k+1 
Zk, 2m = > f oZk-1,2m-2r> k>2,m 21. (6.8) 
r=1 


Comparing this with (6.7) and using Z; 2 = Zo,2 and the property that 
am = flam—2 T Lam = Z0, 2m-2 7 Zi, 2m we obtain 


a os ‘2m Aa at 
Z2,.am = 21,am ~ foto = 2Z),am Z Z0,2m—2> mel. 


22 10. ALGEBRAIC METHODS 


Substituting this into (6.8), by induction we obtain the recursion relation 

Zk, 2m = 2Zk-1,2m — Zk-2,2m-2- (6.9) 
If we write Zp 2m = 2*7 7", 2m, (6.9) reduces to 

Ax 1,2m = Ak,am + Ay-2,2m-2- (6.10) 
These recursion relations are satisfied by 


Oy, 2m = & - i (6.11) 


m 


Now Zo, 2m and Z1, 2m are known, and direct substitution shows that ao, 2m and 
a1, 2m are given by (6.11). Clearly (6.10) uniquely determines a, 2m for k > 2; 
hence it follows that (6.11) is the correct evaluation of ap, 2m, and so 


242m = ae 7 i (6.12) 
m 


To answer another question about the sequence S,,..., S2, we define the 
concept of a sign change in the sequence. We say that a sign change occurs at 
time k if S, = 0 and S,_,S,4, = —1. Then we ask the question, What is the 
probability that there are exactly r sign changes in the sequence S,,...,S2,? We 
will have r sign changes in the sequence if among the S3, S4,...,.S,— 2 there are 
exactly k zeros (k = r,r + 1,...,n — 1) and at exactly r out of these k states the 
process changes direction, i.e., the required probability is 


n-1 


C, 2n = >, Pr{exactly k of S3, S4, --., S2n-2 are zero} 
k=r 


x Pr{r out of k times we change direction at zero}. 
But 


Pr{exactly k of S2, S4, ..., S2n-2 are zero} 
= Pr{exactly k of S4, S2, S3, S4, ---, S2n-2 are zero} 
2n — k — ’) 


5 Qk-an+2 
( n—1 


as is given by formula (6.12). Further, a change in direction at a zero can occur 
with probability 3. Hence 


Pr{r out of k times we change directions at zero} 


[NLL a Aya 
rjr r ‘ 
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tt 2n — k — 2\(k 
E Qk-2n+2 -k 
Cran x ( n—1 e 
= gS os) es) 


so 


j=0 n—-1 r 
eo oo 
fo \n-r-j-1/\j / 


Elementary Problems 
1. Verify the statement of Theorem 2.1 for the Markov matrix 


; where 0<p<1 and O<q<l. 


p = 
q l-q 


For what values of p and q is there exactly one recurrent class in the two-state Markov 
chain corresponding to this matrix? 


2. Let P, Q be finite probability transition matrices of order n such that PQ = I = QP. 
Show that P and Q are permutation matrices, i.e., matrices with only one nonzero entry in 
any row or column. 


3. Suppose we have a two-state Markov chain {X,,n > 0} with states 0 and 1, and Pog = 
1 — a, Poi = 4, Py, = 1 — B, Pio = BO < a, B < 1). Let N be the first index n > 1 such 
that X,_, = X, = 0, and let dọ = E[N|Xp = 0]. Prove that 


a 1+f8 
l-a B 
Ilint: Consider the relationships between dọ and d, = E[N|Xo = 1]. 


4. Consider the 3 x 3 Markov transition matrix 


1 1 
z 2 0 
p-|4 0 4 
04 3 


Determine the corresponding eigenvalues and right and left eigenvectors and thereby 
establish the spectral representation P = ®AY, where 


1/3 1/2 1/6 W/3 IB IB 
=| 0 -ysl P=|1/2 0 -1/2 
1/3 —1//2 1/6 1/,/6 —2/./6 1/,/6 


and 
10 0 
A=]ļ0 4 oj. 
0 0-3 


24 10. ALGEBRAIC METHODS 


5. Let {X,, n > 0} be a Markov chain on the states 0, 1, ..., N. Suppose that states 
0,...,7 — 1 are transient while states r, ...; N are absorbing (p; = 1 for r < i < N). The 
transition matrix has the form 


> 


e-e T 


where Ois an (N — r + 1) x r matrix all of whose components are zero, I isan (N — r + 1) 
x (N — r + 1) identity matrix and q;; = pj for 0 < i,j < r. 


(a) Show that the n-step transition matrix can be written as 


pr- |Q" A+Q+ + QR 
| 0 I 


(b) For transient states i and j, let n;; be the mean total number of visits to state j 
before absorption, conditioned on Xo = i,and let N be ther x r matrix whose elements are 
nij. Show that nj; = Vo pẹ, whence N = I + Q + Q? + ---. Establish that the matrix 
I — Q is invertible with inverse N = (I — Q)7! by directly verifying that NMI — Q) = I. 

(c) For transient state i and absorbing state j, let b;; be the probability of absorption 
in j conditioned on Xo = i. Show that b,; = limp» pf? for 0 <i<randr<j<N, 
whence B = NR where B is the r x (N — r + 1) matrix having elements b;;. 


Remark: N is called the fundamental matrix of the chain. 

6. Let P= |ip;ll] j=, be the transition matrix of an irreducible aperiodic Markov chain 
{X,,n = 0}. Let T; = min{n > 1: X, = j} be the first hitting time to state j and let m;; = 
E[T;|Xo = į. 


(a) Establish the formula (*) M = 1 + P(M — M,,) where 1 is an N x N matrix all 
of whose elements are 1, M= ||m;;|] j=; and 


Mii 0 AE 0 

0 0 
M, = 5 i i ? 

O 0 > myy 


(b) By multiplying Eq. (*) on the left by the stationary distribution n and using n = nP 
show that tM,, = (1, 1,..., 1), whence x; = 1/m; for i = 1,..., N. 


7. Determine P® for the Markov chain with transition probabilities 


1 
CEEE j<itl, 
GDG TS! 
Py=4(i+ 1!) ar ; 
i EA = Le Pt Drea ats 
(i + 2) Ai i 
0, sett. 


Him: Use induction. 
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Answer: 

n Herin 
Cpanel 47rT™ 

(n) — i+1 

e i+ +1? j=i+n, P= 2s 

n 

0, j>i+n. 
Problems 


1. Consider a finite state Markov chain {X,},n = 0, 1,..., with two classes, one of which 
is an absorbing state. For definiteness, let 0 denote the absorbing barrier and let i = 1, 
2,..., N represent the states of the other class. Let the eigenvalues of the transition prob- 
ability matrix arranged in decreasing order be A) = 1 > [A,| > |A3| = [44] > ---. (Note: 
we are assuming that 1 > |A,| > |431.) Let b; be the limiting probability of being in state j 
given that absorption into state O has not occurred and the initial state is i, i.e., b; = 
lim,- Pr{X(n) = j|X(n) ¥ 0, X(0) = i}. (See page 6.) Determine the rate of approach to 
zero of 


P? 
I= Phy 


Se. j=1,2...,N (i21) 


j? 


2. Consider the finite random walk on the integers 0, 1, 2, . . ., N represented by the matrix 
of one-step transition probabilities 


0100 
+ 04 0 
palot ed 
+ 0 4 
0 1 0 


lind the formula for the r-step transition probabilities using the method of orthogonal 
polynomials. 


3. Consider the same random walk as in Problem 2, but now ignore states 0 and N. (States 
O and N act as absorbing barriers.) Then the (N — 1) x (N — 1) matrix of the one-step 
transition probabilities correspondirg to the transient states will be 


0400-00 
4040.. 00 
P=l10 4 04 0 ol. 
0000 4 0 


Vind the formula for the r-step transition probabilities, given that absorption has not taken 
place. 
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Hint: The relevant orthogonal polynomials are Q,(x) = sin n0, x = cos 0 where 0 
varies over the finite set 0 = kn/N, k = 0, 1,...,2N — 1. 


4. Consider the random walk on the circle, i.e., along N + 1 points numbered 0, 1, 2,...,.N, 
placed symmetrically along the circumference of a circle. Let the matrix of first-step transi- 
tion probabilities be given by 


0400 00 4 
4040 000 
pe |e Oe oe O00. 
0000 1o04 
4000 040 


Find the formula for the r-step transition probabilities, P” = ||P®|). 


5. Consider the random walk on the circle as in Problem 4, but let the matrix of first-step 
transition probabilities be given by 


0 p 0 q 
q 0 p - 0 0 
0 0O- 00 
P= J- 
000.. 0 p 
p00 +. g 0 


Find the formula for the r-step transition probabilities. 


Hin: Let Z(0) = pel + geo" in the analysis of Problem 4, 
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6. Prove 


1 2n 
— Í cos” 0 cos k@ cos 10 d0 


T Jo 


) i JE ifn + k + Lis even 
=3[\a +k- n2) \a + k+ p2] z7 BPE ts even, 


0 otherwise. 


(This is Pz, of formula (4.7) for l 4 0.) 
Evaluate lim, Jn Ph. 


Hint: Let k be a nonnegative integer. Prove 


0 ifn + k is odd, 


2n 
5 J en ea a $ ) otherwise. 
(n + k)/2 
To this end, use the identity 
cos"*! 0 cos(k — 10 = cos" 6’cos k0 + cos" 0 sin(k — 1)0 sin 6, 


integrate by parts, and derive the recursion relation 


n—-k+2 


feos" 0 cos kô dé = 
+1 


feos 6 cos(k — 1)6 dé. 
7. Evaluate 


1 2n 
Ph =- Í cos” 0 sin(k + 1)0 sin(/ + 1)8 dé. 
T Jo 


Ilint: Use the result of Problem 6. 


8. Let P = ||P;;|| denote the transition probability matrix of a finite state Markov chain 
Xato consisting of three classes {0}, {1, 2,..., N — 1}, and {N} of which 0 and N are ab- 
sorbing states and the other class is transient. We introduce the family of matrices 


P(A) = Pie" ll = [POI (8 real) 


and the moment-generating functions M(6|k), 


N 
M®(0|Xo = k) = Elexp(O(X, — Xo))|Xo = k] = e, P'(0)e = }; P.O), 


j=0 


where e, indicates the row vector (0, ..., 1, 0,..., 0) with a unit in the position k, e is 
the N + 1 column vector of unit elements, and P‘(6) is the tth power of P(@). Let zo and 
thy = | — Mo (1 < k < N — 1) denote the probability of ultimate absorption in states 0 
and N, respectively, for the Markov chain {X,,}¢ with initial state k. Prove that 


lim MOIK) = noe + neye, 


tr 


28 10. ALGEBRAIC METHODS 


9. Under the same notation as in Problem 8, suppose real a and b exist satisfying 


Mal <1 < Mlk) (k=12...,N-D (+) 
[here M(0|k) = M“(0|k)]. Prove that 
ekb >f ka __ 1 
ono Sx S a i 


Hint: Verify M(a|k) < 1 < M(b|k) (k = 1, 2,..., N — 1) holds for all t and use the 
result of the previous problem. 


10. Consider a Markov chain on (0, 1,..., N) with transition probability matrix 


(1+ o)i 
N+ oi 


N\ , , 
Pj= (*)nia —p)X"4, where p; = , O<a<l 


(see Example G of Section 2, Chapter 2). 
Verify that 


M(0|k) = epe + 1 — py)” 
[see Problem 8 regarding the definition of M(0|k)]. 
11. Under the conditions of Problem 10 show that 


l-o 


b=1 
l+o PB TEE 


a = log 
fulfill the requirements of (+) in Problem 9 and therefore deduce the bounds 
{a — o/ + o- 1 i+ of -1 
N Sli Sa a ANN 
K-o + a] s 1 1/1 +o)" -1 


12. Some Coin-Tossing Relations. Let {X;}, 1 < i< œ, be independent, identically 
distributed, random variables such that 


Pr{X; = +1} = Pr{X; = - 1} =} 
Let S, = } -1 X; for 1 <n < œ and 


P(m, n) = Pr{S,; = 0 for some j satisfying m < j < m + n}. 
Prove that P(m, n) + P(n, m) = | form > 1,n > 1. 


Hint: Recall that Pr{S,,, = 0} = Pr{S, # 0, S, # 0,...,S2, # 0}. Now assume the result 
holds for m = k and arbitrary n > 1. Then justify the following equalities : 


1 — P(k + 1, n) = Pr{S,;#Ofork<j<k+n+1} 
+ Pr{S,, = O and S3; # Ofork+1<j<k+n+1} 
= Pr{S #0 fork<j<k+n+ 1} 
+ Pr{S,, = O}Pr{S.,; 4 Oforl <j<n+ 1} 
= Pr{S,, = 0 for somejwithn +1 <j<k+n-t tl} 
+ Pr{S.,;# Oforl <j <k + IPris2, = 0} 
= Pr{S,, = Oforsomej witha +l aj<k+nt l} 
+ Pr{S,, = Ound S$), 4 Oforn t+ lejek bat iy 
= Pink + 1). 
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13. Let {X,, n > 0} be a Markov chain on the states 0, 1, ..., N..Suppose that states 
0,...,r — 1 are transient while states r,..., N are absorbing (p; = 1 forr < i < N). The 
transition matrix has the form 


> 


QR 
P= 
i 


where 0 is an(N — r + 1) x r matrix all of whose components are zero, Lisan (N — r + 1) 
x (N — r + 1) identity matrix and q;; = p; for 0 < i,j <r. 
For a transient state j, let T; be the number of visits to j before absorption and let 


Als) = EET Xosi [5il< 1, f= O..yr 1. 
(a) Show that 
f(s) = SA — QS)" ‘(1 — Q), 


where 
fo(s) 1 s O > 0 
f(s) = AO 1= : , and S= 9 3 Ei 7 
O l Orey 


(b) Let vj = ELT? |Xo = i] and let V = |iv;;l|. By differentiating the result of (a), 
show 


V=NQN-II where N= (I -QV ' 


(sce Elementary Problem 5). 
(c) Show that the variance of T; given Xo = iis Var[T;|Xo = i] = n;(2n; — 1 — nj) 
forO<ij<r. 


14. Let P = |p;j|| be the transition matrix of a Markov chain {X,, n > 0} on the states 
0, t,..., N. Suppose that states 0 and N are absorbing (Poo = pyy = 1) while states 1,..., 
N — | are transient. Let T, = min{n > 0: X,, = k} be the first passage time to state k for 
k = Oor N, and then specify the finite-dimensional distributions of a process {Y,,n > 0} on 
states 1,..., N using the formula 


Pr{xX, = ji- Xn = Jn and Ty < To|Xo = i} 
Pri Yi =j1,---, % =Jal Yo = i} = - : 
nY = Ji jal Yo = if Pr{Ty < TolXo =i} 

Informally, we speak of { Y,} as the process {X,,} conditioned on eventual absorption in 
state N. 


(a) Show that {Y,} is a Markov chain having transition probabilities q;; = p;;u;/u;, 
where u; = Pr{Ty < To|Xo = i}. : 
(b) Show that the n-step transition probabilities satisfy q? = p\?uj/u;. 
(c) If mj, (respectively, n;;) is the mean number of visits to state j of { Y,}, (respectively, 
(XD before absorption, conditioned on Xo = Yo = i, show that m;; = njjuj/u; for i, j = 
iow N= | 
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15. In Section 2 it is indicated that for a finite absorbing Markov chain the limits 


b; = lim Pr(X, = j|X,€T, Xo =i}, je T, 
exist, and that the elements b; form the left eigenvector of P corresponding to ,, the largest 
nonunit eigenvalue of P. (Assume T, the set of transient states, is closed and aperiodic.) 
Normalize so that X jer bj = 1. Then {b;} is known as the quasi-stationary distribution (or 
asymptotic conditional distribution) of {X,,}. 
Show that if Pr{X 9 = j} = b; then Pr{X, = j|X„ E T} = b;foralln > 1. 
16. (continuation). Let {c¢;}jer be the elements in the right eigenvector of P correspond- 


ing to A, and normalized so that )’j.7 c;bj = 1. Show that 


lim lim Pr{X,, = j|Xn+n€T} = bjcj. 


m> co k>% 
The distribution {b;c;} is called the product distribution. 


17. (continuation). Let {X*} be a new Markov chain whose transition matrix P* has 
elements 
b, 


Pe = 2 
ty Ay 


Pj 


ji 
i 


Show that the product distribution is the stationary distribution for this chain. 


18. (continuation). Let {X*} be the Markov chain whose transition matrix P* has 
elements 


Find the stationary distribution. 


19. Suppose the chain of Problem 15 has at least two absorbing states. Form a new 
Markov chain by conditioning an eventual absorption into a particular fixed state. (i) 
Show that the product distribution on the transient states is unchanged. (ii) Evaluate the 
quasi-stationary distribution of the new chain. 


NOTES 


Chapter 16 of Feller [1] is devoted to a discussion of algebraic methods in Markov chains. 
Also relevant is the book by Kemeny and Snell [2]. 
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This chapter provides an introduction to some important and expanding areas in 
probability theory. Section 6, on optimal stopping, requires only the definitions of 
regular and superregular from the previous section as a prerequisite. 


1: Taboo Probabilities 


In the null recurrent irreducible Markov chain the average of the transition 
probabilities (1/n) PE Př; tends to zero as n increases to 00. Even so, recur- 
rence to any given state occurs with certainty. This says that the relative fre- 
quency of visits to any given state approaches zero with the passage of time yet 
each state is visited infinitely often with probability 1. It is meaningful to deter- 
mine the number of visits to a given state i relative to the number of visits to a 
different state j as the number of trials increases to infinity. For this purpose, it is 
useful to introduce the “taboo transition probabilities” 


Pi = Pr{X, =j, X, #Ak,v = 1,2,...,n — 1|Xo = i} for k#Aj,n>1, 


the probability that the process will go from state i to state j in n steps without 
entering state k in the intervening time. In this context state k is called a taboo 
state. Similarly, we define for k # j,n > 1, 


afi = Pr{X, =j,X, #j,X, Fv = 1,2,...,n— 1|Xo = i}, 


the probability that the process will go from state i to state j for the first time at 
the nth transition without visiting state k in the intervening time. For conyeni- 
ence we define for k 4 i l 


o ifi#j 
M a ” 
ae f if i=j 


a 
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and 
rf; =0 for all i,j. 
Now for i 4 j,n > 0, 


-5 Pit? ij (1.1) 


since the event of first transition from i to j in n steps can be decomposed into n 
mutually exclusive events of returning to state iin v steps without visiting state j 
in the intervening time and then entering state j for the first time in n — v steps 
without returning to state i, v = 0,1,...,n — 1. 

The key to the derivation of (1.1) consists of classifying the paths according 
to the last time prior to n at which state i is visited. This should be contrasted 
with the method underlying the derivation of formula (5.1), Chapter 2, which 
exploits the idea of the first occurrence of a given event. 

Generally, relations involving taboo states are verified by considering the 
first or last occurrence of some event. This duality between first and last plays 
an important role in many areas of probability theory. This is particularly true 
in the case of sums of identically and independently distributed random variables 
where there is a complete equivalence between these concepts. 

The identity (1.1) can be expressed by an equivalent generating function 
equation. This may be deduced by paralleling the method that led from (5.9) 
to (5.10) of Chapter 2. To this end, we define the generating functions 


i fils) = 2 if iis” $ 


Pits) = 5 Pas”. 
n=0 
Then owing to the convolution form of relation (1.1) we obtain for i 4 j 


fis) = jPids) iis). (1.2) 


Now, since 
Z fipsi and Lif fi < 
n=1 n=1 
by Abel’s Lemma, part (a) (Lemma 5.1 of Chapter 2), we conclude that 


lim f(s) = > fi and lim ;fi(s) = D ifi 
sol n=1 soln n=1 


Ifi and j communicate, then there isann > | with 7, > 0, and then also by (1.1) 


ft)’ >O  forsome ve Q1....a—- 1. 
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Thus, 
DE >0 


and then 


lim ,P;(s) = dares Si < 00. 


aia” Teje E 


Hence, by Abel’s theorem, part (b), 


jPi = by jPi = lim ,Pi(s) < oœ. 


s>1- 
2: Ratio Theorems 


The proofs of the two theorems of this section require the following lemma: 


Lemma2.1. Let 
n 
= $ ab,  n=0,1,2,..., 
v=0 


where 0 < a, < K (K a positive constant) and aan a, = 0. Then the relation 
lim, bn = b with b finite implies 


lim =b. 


n> o P 0 a, 
Proof. Observe that )"~9 a,_y = )"=0 a,. Hence, 


Ch _ b e ee Any dy E b 2 Yezo an- (b, = b) 
er ay, Srs ay Dto An-v l 
Since b,, > b, which is finite, there is an M > 0 such that |b,| < M forall n > 0. 
Now choose N = N(e) such that 


|b, — bl <€ forall n>N. 


Then 


b| < mass 0 An- (b, — + [Hs an- (b, wa b) 


| 
n n 
a ay Sag ay 


ads 


|e =N Any 
DA =0 ly 


< am |25 d an- v 
toa, 


2MNK 
Se EE 
Dee dy 


Letting n = œ and then £ } 0, the lemma is proved. E 
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Next we state three relations similar to (1.1) whose proofs are similar to the 
proof of (1.1). 


Fork 4 j,i #j,andn > 0, 


n 


Pi = 2 wf ij Pi; (2.1) 
Pi} = Prt ij y (2.2) 
P} = 2 ToP” (2.3) 


Equation (2.3) is the same as (5.9) of Chapter 2. 


Theorem 2.1. Let i and j be arbitrary states and assume that j is recurrent; then 


dm Aish, 


where 
$= 2 fi. (2.4) 


Proof. From Eq. (2.3) 


a) 


n=0 n=0 


=È D= Ý Savery LL SYP 


since f7; " = 0 for u > n (by convention). 
Since both summations are in fact finite they can be interchanged. 


m foe) m m-p 

LP =2. PLY f= PHY Sir 

n=0 u=0  r=0 
Define 

ME i for m=0,1,2 
and 
rr =0 for m= —1, —2,.... 

Then 


m 
X P= Èr H = Ler "Pi = Èr P», ‘Fi. 


n=0 yur () 
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Now we apply Lemma 2.1 with 
a, = Pi,, b, = Fij, and Cm = $ Ph 
The conditions of the lemma are satisfied, since 
O<Pi,<1 and Spee: 
v=0 


as j is recurrent. Since 


lim bm = lim F} = 


mo mo r 


foo} 

È fi = ft 
=0 

we conclude by virtue of the lemma that 


lim ea Pi; 
yr P”, 
m> 0 n=0 4 jj 
and the theorem is proved. W 


= fi 


Rewriting Eq. (2.1) with k = i ¥ j gives 
;P?, = X ifinPi for n> 0. 
v=0 


Passing to the corresponding generating functions we have 


iP ifs) = ififs) iP jf). 
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(2.5) 


Since we have previously shown that ;P¥, = a Pij < œ, provided states i 


and j communicate, invoking Abel’s lemma yields 
s>1-— soi a 
Thus, by Abel’s lemma, part (b), we have 
iP = D iPij = lim iPifs) < ©. 
soi- 


Theorem 2.2. If iand j are in the same recurrent class, then 


m n 
lim £70 4 Pij 
m 
m> œ yay Pii 


Remark. For i 4 j we introduce the random variables 


= ok 


1 if the process starting in state i 
is in state j after n steps without 
U, = U(i, j, n) = returning to state i in the intervening 
time, 
0 otherwise 


(2.6) 
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Then E[U,,] = Pi; and 


e| $v.| = E, Ph = Pr. 


n=1 n=1 


Thus it follows under the conditions of Theorem 2.2 that ;P¥ is the expected 
number of visits to state j between successive visits to state i. 


Proof. From Eq. (2.2), 


since ;P7; ” = 0 for v > n. Interchanging the summations, we find 


YP Pad Py = Y PaP, 


where 
P= YP} for m=0,1,2,... 
v=0 
and 
Pr = 0 for m= —1, —2, 
Then, 


m m m 

no v pm-v _ m—v Pv 
LP = LPP = 2 Prey. 
n=0 v=0 v=0 


Now, we can apply Lemma 2.1 with a, = P}, b, = Piss and Cm = )n=0 Pij 
since 0 < |P},| < 1 and $% o P} = œ as iis recurrent. But 


lim b,, = lim P} = > ;Pi; = PR, 


and thus it follows by Lemma 2.1 that 
m z 
li im n=0 
naa Do Pk 


This proves the theorem. E 


— x 
= ,Ph. 


If i and j communicate, we can write 


n=0 P} espe Bee oe =0 Pij 


lim J = lim 
Mm 
m~ 00 Fr Pt m> w Seo P) n=0 Ph 


since )'"_ Pt, > 0 for m sufficiently large. 
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Now if both i and j are recurrent and in the same class, then according to our 
last two theorems the first ratio on the right-hand side tends to 1/f# = 1, the 


second ratio approaches ;P#, and therefore 


lim Damo Ph _ P$. 
m> co ya 0 Pi; 


From (1.1) and (2.5) and provided i 4 j, we obtain, respectively, 
SS) = jPils) fi), 
iPifs) = ifiks) iP i). 


Then from Abel’s lemma we have the identities f$ = Pif 5, iP = if piP. 
If i and j belong to the same recurrent class, then f# = 1 and, therefore, 


, 


3: Existence of Generalized Stationary Distributions 


In the case of an irreducible positive recurrent class the stationary distribution 
{m;i} £o constitutes a convergent positive solution (i.e., X£ o m; < 00) to the 
system of equations 
0 
Y Ppa.  j=0,1,2,.... 
i=0 


This is the content of Theorem 1.3, Chapter 3. Our next theorem proves that this 
property characterizes positive recurrence. 


Theorem 3.1. Assume the Markov chain is irreducible. If the system of equations 


& 
X xP = Xi, i=0, Liss (3.1) 
= 

has a solution for which 

foo} 
Vixjl< oo and — x; are not all zero, 
jJ=0 


then the Markov chain is positive recurrent. 


Proof. By simple iteration we can obtain for any n > 1 


oO 
X xy Pi = Xp. 
J=0 
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Let P™ = (1/m) )™=¢ P". Then 


Now let m — oo. Since 
foe) vs; foe} 
L lx;PHl Ss È [xl < ©, 
j=0 j=0 


we can interchange the limit and the summation. Then 


foe} foe} 
j=0 


m>o j=0 m> co 
But 
m-1 
lim Pi = lim — ) Ph =7,>0 
m> oo mo M n=0 
Hence, 


J 


Since ae |x;| < œ and there is an i such that x; # 0, we have shown that for 
some i 


m; #0 and therefore 7; > 0, i=0,1,2,.... B 


A converse to the theorem above is also true, in the following strengthened form 
involving inequalities. 


Theorem 3.2. If an irreducible Markov chain is positive recurrent and x; > 0, 
j = 0, 1, 2,... are solutions of the set of inequalities 


foo} 
D apa T e 


j=0 


then 


Proof. As before we have for any n > 1 


oO 
X xP’ S xX 
=o 
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and further, for any m > 1, 
foo} 5 7 1 
> xj Pi S Xj, where PY paras £ P". 

i= m 


Since.x; > 0, P? > 0, $0 x,P™ < x; for any M > 0. If we let m > o, 


M M 
lim x; Ph = 1; > x; < xi 
m>o j=0 j=0 


Since 7; > 0, the partial sums }' o x; are uniformly bounded for all M > 0. 
Therefore 


œ 
xX; < oO. | 
To 


J 


According to Theorem 3.1 for irreducible null recurrent processes there cannot 
exist a convergent solution of (3.1). Nevertheless, there do exist interesting 
positive solutions, as attested to by the following theorem. 


Theorem 3.3. For a recurrent irreducible Markov chain the positive sequence 
given by 
v = 1, vi = oP bi, Pe Ty Zass 


is a solution of the system of equations 
v; = X vj Pi, i= 0, 1, 2; ete (3.2) 
[see (2.6) for the definition of oP%;). 


Proof. We have, by the definition of 9 P%;,, 


foo} foo) 


0 foe) 
24Pa= È oP iP + Por = Port Yd oPOjPi- 
£ 


j=1 j=1 n=1 


The double series on the right-hand side is composed of all nonnegative terms, 
and hence, the order of summation can be interchanged to yield 


œ foo} 
vj Pj; = Poi + X È oPh Pi 


n=1 j=1 


ims 


Now 


= Pott if i#0 
pr P = o; oi : ? 
Boras (See if i=0. 
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Thus, fori # 0 


foe} fee) foe} 

= AA n+l __ rs 
X vj Pie = Por + Y oP bi = J oP = oP = v, 
j=0 n=1 n=0 


since 
Poi=oPo for i#0. 
For i = 0, 
oe) o eA 
L ojPjo = Poo + X foot= LX fo! = fio = 1 = vo, 
j=0 n=1 n=0 


and the verification of (3.2) is hereby complete. Ml 


Theorem 3.4. For a recurrent irreducible Markov chain the system 


v=} yP  i=0,1,2,..., (3.3) 
j=0 
tend, v20, i=1,2,..., (3.4) 


has a unique solution. 


Proof. Since by the previous theorem we know that vo = 1, vi = oP%;, is a 
solution of (3.3) that also satisfies (3.4), we must prove that there is no other 
solution to (3.3) fulfilling condition (3.4). 

Let {a;} be a sequence that satisfies (3.3) and (3.4). Then 


foo} 
a= YiajPy, i=0,1,2,.... 
j=0 


Multiplying both sides by P,, and summing over i, we have 


foe) foe} foe} 

Qa = È aP = Y Pin Ya; P ji 
i=0 i=0 j=0 
eo foe) foe} 

= aj} PPa = } a;Ph- 
j=0 i=0 j=0 


The interchange of the order of summation is justified since all terms in the above 
expressions are nonnegative. Similarly, by repetition of this procedure we obtain 
foranyn 2 1 


w 
a= Yay ms i=#(0,1,2,.... 
J=0 
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Since the Markov chain is irreducible and recurrent, for each i there isan n > 1 
such that P; > 0. Then 


ai = Ya; P%; > ap Ph; > 0 as ao > 0. 
j=0 


Therefore, a; > 0 for all i. 
Next we introduce the quantities 


d; 
Qi; = — Pj. (3.5) 
a; 
Clearly, 
29 d m oe 
Q,>0 and ¥ Oy = eos 1. 
j=0 aj 
Thus, we may regard the Q;; as transition probabilities of a Markov chain. The 
second-order transition probabilities are given by 


oO 


= ak a; 
Q} = È OkO = 2 — Pu Pix 
k=0 k=0 i ak 
BCR a; 
PP, = — P3. 
a; >, po a; á 


Similarly, the n-step transition probabilities are given by 


It follows that 
œ foe} 
X Qi = = Pi, = © 
n=0 n=0 


and the Q;; are transition probabilities of a recurrent Markov chain 2. We now 
apply the ratio theorems. From Theorem 2.1 we can write 


Din=o Dio 
lim 
m> æ Ons =0 Qoo 


where fjo(Q) is defined with respect to the Markov chain 2 in the usual way. 
lts value is 1 since 2 is a recurrent irreducible Markov chain. But we also haxg by 
‘Theorem 2.2 


= f0) = 


m n m n 
lim &=0 Qio _ 40 jim Lao Poi _ do 
m n 7 m n T 


mo Ln=0 £00 Ui mao Qun=0 Foo a 
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Since dy = 1, we have shown that 
a; = oP ai, Pe Lipa 


This proves uniqueness. I 
4: Interpretation of Generalized Stationary Distributions 


The Markov chain 2 with transition probability matrix ||Q;;|| associated with 
P = ||P;;|| through formula (3.5) in terms of some positive solution of (3.3) is 
referred to as the reversed process of P. In the positive recurrent case where 
v; = cn; (c = a constant), Q;; possesses the following interpretation. Assuming 
that the initial distribution of the state variable is {z;}, i.e., the Markov chain 
process {X(n);n > 0} begins in state k with probability z,, we calculate the con- 
ditional probability that the initial state was j, given the state after one transi- 
tion is i. By Bayes’ rule this is simply 
_ Pr{X(1) = i] XO) = j} Pr{X@) = j} 


Q; = Pr{X(0) = jIX() = ù} = PKG) =i . (4) 


Since the process is stationary, 
Pr{X() =i} =m,  Pr{X(0) =j} = 7, 


and (4.1) obviously becomes 
y= (4.2) 


Iteration of (4.2) does indeed reverse the time scale. We easily deduce that if 
X(0) has the initial stationary distribution {z,}, then 


ij = Pr{X(0) = j| X(n) = i}. 


The name “reversed process” assigned to ||Q;;|| is manifestly apt. 

The above method of introducing the reversed process applies whenever a 
positive (not necessarily convergent) solution of (3.3) is available. This device 
will be of further use in Theorem 5.2 below. 

As indicated by Theorem 3.1, the solution of (3.3) is divergent in the null 
recurrent case and convergent for the positive recurrent case. An interesting 
interpretation can be given to any solution of (3.3) convergent or not. For the 
positive recurrent case the values {v,;}729 are proportional to the stationary 
probabilities of being in the various states. In the general case the values v; can 
be interpreted as the stationary expected number of particles in state i under 
suitable conditions of equilibrium. The precise sense of this equilibrium phe- 
nomenon is described in the following theorem, 
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Theorem 4.1. Suppose a denumerable number of particles are independently 
undergoing a Markov process determined by P = ||P;,;\|. Suppose A,{n) represents 
the number of particles in state i at time n. If AO), i = 1, 2,..., are independent 
Poisson random variables with means v;, where >), vy Pri = vi, vi > 0, then A((n), 
i= 1,2,..., are again independent random variables (r.v.’s) with the same distri- 
bution as AO), i = 1, 2,..., respectively. 


Remark, The statement that an infinite number of random variables are 
independent and each Poisson distributed shall mean that any finite sub- 
collection has this property. 


Proof. We proceed by induction. 
Let A,(n; i) be the number of particles in state k at time n which were in state 
iat time n — 1. Define the vectors A(n) and A(n; i) by 


A(n) = (A,(n), A2(n),...)3 A(n; i) = (A (0; 1), Aa(n; 3), -- -). 


Then A(n) = }; A(n; i). Now, by the induction hypothesis (that A,(n — 1), 
i= 1, 2, ... are independent r.v.’s) and’ because of the assumption that the 
particles act independently, it follows that the vectors A(n; i), i = 1, 2,..., are 
independent random variables. We shall further show that the components 
A(n; i), k = 1, 2,..., of each vector A(n; i) are independent random variables. 
It then follows that A,(n) = }; A(n; i), k = 1, 2,..., are independent r.v.’s. 

Now for any finite number of components and integers a,, dz, ..., Ap, We 
have 


Pr{A,,(n; i) = ay, Ap (n; i) = ay,..., Ap (n; i) = a,} 
= ¥ Pr{A(n — 1) = a} 
a=0 
x Pr{4, (n;i) = ay,..., A(n; i) = a| A(n — 1) = a}. (4.3) 
By the induction hypothesis the first factor in the sum is equal to 
vi 
exp(—v;) PE (4.4) 


Because of the conditioning and the fact that the particles act independently, 
the second factor is a multinomial probability distribution. Thus 


Pr{A,,(n3 i) = ay,..., Ay,(n3 i) = a,|4;(n — 1) = a} 
Tes Ni Pu aa 
rc -nù Pa) SEE (45) 


aq lag lsat yet. 


‘This expression is taken to be zero in the case $% u; ay > a. 
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Inserting (4.4) and (4.5) into (4.3) gives 
Pr{4, (n; i) = a1; A,,(n3 i) => a2, rary A, (n; i) = a,} 
00 r (v;P; S am 
= © II MPa) exp( — Vi Pirm) 


a=% -=-;aym=1 


Leet Pee . 
. K Lr- 5 ae i exp| -a1 — È Pan) 


=j eae Seay A | $ [U = Ehe Pa Ol 


am! =a (a — a*)! 


m=1 


x exp| -a(1 - 3 Pan) | (4.6) 
m=1 


But the sum plainly equals 1 (set a = a* + «, sum over æ) since it is just the sum 
of the Poisson density for the parameter 


( = Pag) 


The resulting factorization in (4.6) shows that {A,(n; i)} are independently 
Poisson distributed with means {v; Pip}, respectively. 

Hence, A(n) = È} £o A,(n; i) are independent Poisson variables with re- 
spective means V2.9 uv; Pix = Up. 

Finally we note that A,(0), i = 1, 2,..., were prescribed to be independent 
and Poisson distributed with means v;, which together with the induction argu- 
ment implies the result. I 


5: Regular, Superregular, and Subregular Sequences for Markov Chains 


Various criteria for determining recurrence, transience, and positive recurrence 
of a Markov chain were given and applied to some Markov chain queueing 
models in Theorems 4.1-4.2 of Chapter 3. The conditions set forth there per- 
tained to the nature of the solutions of either of the systems of equations 


€;Pn = Cis k=0, 1, Que eis (5.1) 


ime 


and 
S Premam 1=0,1,2,.... 
k=0 
The modern point of view on this problem is to present these developments as a 


corollary of the theory of regular, superregular, and subregular sequences. We 
now elaborate some of the simpler aspects of this elegant theory, which also 
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embodies the elements of potential theory for Markov matrix operators. The 
classical potential theory appears in treating these same ideas in the case of 
Brownian motion. This interplay between potential theory and probability has 
received much attention in recent years and provides insights from one field to 
the other. 

Let P be a given transition probability matrix. A nonnegative vector (i.e., 
sequence) u = {u(j)}§ 9 is said to be, relative to P, 


right regular if XP; juj) = u(i) (abbreviated r-regular), 
; j 
right superregular if 5 P;,u(j) < uâ) (r-superregular), 
j 
right subregular if > Pulj) = uli) (r-subregular). 
j 


A right superregular sequence {u(i)} is said to be minimal if 0 < (i) < u(i) 
(i > 0) and ¢(i) regular implies é(i) = cu(i) for all i and for some constant c. A 
nonnegative vector {v(i)};2o is said to be 

left regular if => v(i)Pi; 


S i 


v(j) (l-regular), 


A 


left superregular if $ oP; < v(/) (J-superregular), 
left subregular if X oP; > v(/) (I-subregular). 
Our first result is a representation theorem for right regular sequences in 
terms of minimal regular sequences. 


Theorem 5.1. Let u be a vector which is r-superregular with respect to P. Then 


a(i) = lim ¥. PPU) 


n> j 


exists for all i, and a is an r-regular vector with respect to P. Moreover, if b is a 
vector which is r-regular with respect to P and b(i) < u(i) for all i, then b(i) < a(i) 
for all i. If we write 


u(i) = a(i) + c(i), i>0, (5.2) 
where 
c(i) = u(i) — a(i), 
then c(i) is minimal r-superregular. 
Proof. We have, using the definition of r-superregular, 


È PPU) = LY PR PP. ju) 
Tk 


=} PHONY Pauli) < È PR Pulk). 
k 7 k 
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In concise notation we can write this as P'a < P”~ tu, where inequality between 
vectors is understood as being componentwise. Thus for each i, 


u(i) = X Piu(j) = X PPU) >. 
J J 
Since all the terms are nonnegative, a(i) exists and a(i) < u(i) for all i. Now 
È Pig X PRulk) = J, PR Puh). 
j k k 


If we let n — oo, the right side converges to a(i), while the left side, if we formally 
carry out the passage to the limit under the summation, tends to 


X P ijaQ); 
J 
in other words, 
È PijaCj) = ali). 
J 


To justify the passage to the limit within the summation, let i be considered as 
fixed. For ¢ > 0 there exists N(e) such that 


X P;ju(j) <6 
j>N(e) 


Thus 


X Pid PRulk) < J Piywi)<e  foralln 
j> N(e) k jJ>N(e) 


Now 
> Pud Puk) = Yo Pi di PRulk) + YD) Pi}, P§ulk). 
j k J<N(e) k j>N(e) k 


As n > œ, we have already seen that the left side tends to a(i). Since the first 
sum over j on the right side is a finite sum, its limit is just | 


X P ij a()). 
J<N(e) 
The second term is at most e. Thus 


ai) = È Pya) + di), where 0< d(i) < e, 


J<N(e) 


which implies readily that a(i) = J; P;,a(j), and a = {a(i)} is r-regular with 
respect to P. Finally, suppose that 


bO = ¥ Pub) < uli) forall i. 
i 


Then by induction 
bD = È PPU) s $ PPU) for all n, i. 
J J 
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Hence 


b(i) < lim ¥° P{Pu(j) = ali)  foralli. 
n>-o j 
It is trivial to verify that c(i) = u(i) — a(i) is r-superregular. It remains only 
to establish the minimal character of c(i). Suppose 


0 < &(i) < c(i), i20, (5.3) 
where {€(i)} is r-regular. Applying the matrix P to (5.3) n times yields 
§ = P" < P'e. 


But by the definition of c we see that Pe = P"u — P"a = P"u — a tends to the 
zero vector. This conclusion verifies that {c(i)} is minimal r-superregular as 
claimed. W 


The representation (5.2) in the case of Brownian motion becomes the classical 
Riesz representation involving superharmonic functions, harmonic functions, 
and potential functions. The development of this elegant theory is beyond the 
level of this book. 

For transient Markov chains we can construct r-superregular functions very 
simply. Recall that in this case } o Ph, < © for all i, k > 0. 

We claim that for kọ fixed 


0 
u = > Pho i=0,1,..., (5.4) 
n=0 
is r-superregular. Indeed, 
oO 0 
È Pijus E 2 Pio = u; — PE, < uj. (5.5) 
j= n= 


The above construction exhibits an abundance of nonconstant positive r- 
superregular vectors. The nonconstant character of (5.4) is evident since in- 
equality prevails in (5.5) when i = kọ. The situation is quite different for the 
recurrent case. The next theorem asserts that the only r-superregular sequence is 
the constant vector. This generalizes the criteria of Theorem 4.1, Chapter 3. 


Theorem 5.2. An irreducible Markov chain with transition matrix P is recurrent 
if and only if every nonnegative vector ¥ which is r-superregular with respect to P 
and has at least one positive component is a constant vector. 

Proof. Example (a), Section 6 of Chapter 6 can be modified, invoking the 
convergence of a nonnegative supermartingale, to yield a shorter proof than 
the direct approach that we now give. 
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Let the Markov chain be recurrent and consider 


u; = 2 Pijuj, u20 for all i. 
J 


First we show that if u;, > 0 for some jọ, then u; > 0 for all j. In fact, given jo 
and k there is an n such that PẸ? > 0. As in the preceding theorem 


uk = 2 Pu, > PE Uj, >0. 
J 


Thus ifu Æ 0 and is nonnegative, then u; > 0 for all i. Now let k be arbitrary 
but fixed, and set é; = u,/u,. Then 


6: =): Py = } Pid; + Pir- (5.6) 
j j#k 


Iterating this inequality yields 


é > y Pa DP + Ps] + Pix 
j#k stk 


X PiyPisés + X} Pi Pi + Pix 
Js#k J#k 


DY PiyPisés + £2 +F9P 
J,s#k 


where we have interpreted the last two terms as first passage probabilities. 
Another application of (5.6) gives 


č Z PoPa D Poč, + Pa) + fi + fp 
Js r 


= J PiyPisPote + > PiyPisP tf R HIP 
jstk 


J,s,r#k 


= 5 PiP Paré HIR + ff + fp. 


j,s,r +k 


Proceeding in this manner a simple induction proves 


E> VFR  forallm 
1 


= 
and therefore 


oe} 


n=1 
since the chain is recurrent and irreducible. 
Thus č; = u,/u, 2 1 or u; = up. But i and k are arbitrary; hence n = uy, 
for all i, k. 
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To prove the converse we assume that the chain is nonrecurrent. Let k be 
arbitrary and set 


ped Sit if i#k, 
io y if i= k. 


(Recall that f} is the probability of ever visiting state k starting from state i.) 
Then 


u; = R= X Puf% + Pa = > Pyu; if ixk 
j#k j 
and 
Ukg = 1 Hins L Pufh + Pu = È Puu 
J J 
Thus u is r-superregular. Now if f} = 1 for all j # k, we have 


Sfi = Pu ft + Pua = } Pit Pa = Yo Pay = 1, 
j#k j#k j 


which contradicts the assumption of nonrecurrence of the Markov chain. 
Hence u is a nonconstant bounded r-superregular vector. Hl 


With the assistance of Theorem 5.2 it is easy to derive a stronger version of 
Theorem 3.4 which permits inequalities, namely, 


Theorem 5.3. For a recurrent irreducible Markov chain the system 


vo = 1, vi > 0, i ed ere (5.7) 


has a unique solution. 


The method of proof as in Theorem 3.4 is to pass to the reverse process of P 
and thereby reduce the consideration of |-superregular vector sequences to that 
of r-superregular sequences. We then appeal to Theorem 5.2 for the desired 
conclusion. This device of using the reversed process to transform problems 
involving left regular concepts to right regular concepts is common and quite 
powerful. 


Proof. We have seen earlier that v9 = 1, v; = oP%; is a solution of (5.7). In fact, 
this sequence is a solution with > replaced by equality and possessing the 
property that v; > 0 for all i. Setting i 


vP; ta 
S RDI, 
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we have Q;; > 0 and 
v 


oO e 

== L — 
X ojPe = — = 1, 
j=0 Ui 


|= 


» Qij = 
j=0 


= 


l 
sothat Q = ||Q;;|| is a transition probability matrix. Furthermore, as in Theorem 
3.4, 


Uk 
oR = = PR 


L 


and the irreducibility and recurrence of Q follow from that of P. Now if 


co 
C= È cP iss B= O31 E 
=0 


J 
and 
Co =l, cj 20, j= 1,2, >’ 
then 
2 c 12 c 
È Qi i=- Ð Or ea, 
j=0 Fi Ui j=0 Ui 


which says that the vector {c;/v;} is r-superregular with respect to Q. But the 
preceding theorem asserts that {c;/v;} is a constant vector. Since co = vo = 1, 
we conclude that c; = v;, i = 0, 1,..., which proves the theorem. W 


6: Stopping Rule Problems 


Stopping rule problems occur in many areas of management science as special 
types of control problems in which the decision maker has only two available 
actions, “stop” and “continue.” The analysis of these problems uses the theory 
of superregular vectors which motivates us to develop part of this theory here. 

Let {X,, n > 0} be a discrete state Markov chain with transition matrix 
P = ||P;;||. Associated with each state j let r(j) be a nonnegative reward. To 
achieve some simplicity, yet preserving the underlying ideas, we will assume 
that {r(j)} is a bounded nonnegative sequence, 0 < r(j) < lirl = sup; r(j) < œ. 
We sequentially observe Xo, X,, .... At each stage the option is available 
either to stop or continue observing. If at stage n we stop, the reward r(X,) is 
acquired and the game has culminated. If we never stop, then no payment 
accrues. Our task is to characterize the maximal expected reward and deter- 
mine an optimal rule for stopping, provided one exists. 

Corresponding to a prescribed decision procedure, let T be the time of 
stopping, with T = œ signifying tiie event that we continue forever. Clearly the 
problem is meaningful only when T = n is determined by the history up to 
time n and not the future. More precisely, we require for every n = 0, I, ... 
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that the event {T = n} be determined only by Xo, X,,.... - , (and, of course, 
on knowledge of the total reward vector r = {r(j)} and t... transition prob- 
abilities P;;). A random variable associated with a Markov pı cess and having 
this property is called a Markov time. 

Call a sequence of states ig, i,,..., i, a stopping sequence if along this se- 
quence we decide to stop at i, but not before. A Markov time is fully specified by 
producing a list of all its stopping sequences, since if at any time n the chain 
has evolved through Xo = ig,..., X, = i,, in order to decide whether to stop 
(T = n) or not (T > n) we simply check to see if the sequence ip, ..., i, is on the 
list or not. 

Stopping at a fixed time T = m is the simplest example of a Markov time. 
Specifically, any (m + 1)-tuple (io, ..., im) of states is a stopping sequence for 
this time. Also the first time, if any, that the process reaches a given set B of 
possible states is a Markov time. A stopping sequence in this case is any sequence 
(ip, ..., iy) of states for which i, ¢ B for k = 0,...,v — 1 and i, € B. If T, and T, 
are Markov times then so is T = min{T,, T,}. Every stopping sequence for T 
may be found in the, union of the set of stopping sequences for T;, i = 1, 2. 
It follows from this discussion that if T is any Markov time then T,, = min{T, m} 
is a bounded Markov time. 

If u = (u(i)) is a nonnegative vector, the expression u(X r) may not possess 
meaning since T = œ is a possibility. It is convenient to adopt the convention 
that u(X r) = Owhen T = œ. This is consistent with our prerequisite that never 
stopping entails no reward. Thus the expected reward E[r(X7)| Xo = i] is the 
expectation in the standard sense, of the random variable Z, where 


oa ie if T <0, 


0 if T= 0. 
Computationally, 
E[r(X7)|Xo = i] = Yi Pr{T = nand X, = i,|Xo =i} 
i 2 rin) Pr{Xo Sipe = hlXo = i}, 
io, wes tn 
where the last sum runs over all stopping sequences (io, ..., i,) for T. 


Define the optimal reward vector v by 
v(i) = sup{E[r(X7)|Xo = i]; T a Markov time}. 


Since r(j) < ||r|| < oo for all j, then manifestly v(j) < ||r|| < co for all j. We will 
establish that v is the smallest right superregular vector satisfying v(i) > r(i) 
for all i. Morcover, if an optimal Markov time exists, then it may be taken to be 
the first time, if any, that the process is in the set of states i for which v(i) = z(i). 
That is, 
mz ae W(X ,) = r(X,)}, 
~ lo if o(X,) > r(X,)  foralln. 
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This makes sense. It calls for continuing as long as the optimal reward v(X,) 
exceeds the reward r(X,,) available through stopping immediately, and stopping 
at the first state X, in which v(X,) = r(X,). The stopping sequences are of the 
form (ig, ..., im) Where v(i,) > r(i,) for k = 0,...,m — 1 and v(i,,) = rim). 
First we catalog some properties of bounded right superregular vectors. 


Lemma6.1. (a) Every nonnegative constant vector is an r-superregular vector. 
(b) If for each a in some set A, u, is an r-superregular vector and we define 
u(j) = inf, u,(j), then u is r-superregular. 
(c) If u is an r-superregular vector, then u(i) > yy, Piu(j) for any i and 
n=1,2,.... 


Proof. Since}; P;; = 1, every nonnegative constant is manifestly an r-regular 
vector, a fortiori r-superregular, and statement (a) is proved. Let u(i) = inf, u,(i) 
where each u, is r-superregular. Certainly u is nonnegative since each u, is. Also 
that u(j) < u,(j) for all j and that u, is r-superregular imply the inequality 


È Paul) <}, Piju) < ui), 
J J 
for all i and an arbitrary «. It follows that 
È Piul/) < inf, ui) = u(i), 
j 


which shows that u is r-superregular. This proves (b). 
The assertion of (c) is true for n = 1 since u is r-superregular by assumption. 
We proceed by induction. Suppose for n that 


uk) > Ẹ Phul) 
J 
holds. Now multiply both sides by the nonnegative P;, and sum over k to get 
Na 
È Pulk) > Y Pa > Py; u()). 
k k j 


The left-hand side is at most u(i) since u is r-superregular, while the right-hand 
side is }.; P?,* 'u(j). Thus, 


u(i) > È Pi; uQ) 
j 
is secured advancing the induction. The proof is complete. Hi 


The statement of part (c) can be written compactly in the form 
u(i) = Elu(X,)|Xo = i] 


for any time index n. The next theorem indicates that the fixed index n can be 
replaced any Markov time T while maintaining the inequality. 
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Theorem6.1. Let u be a bounded r-superregular vector and T a Markov time. 
Then 


(a) uli) > E[u(X7)|Xo = i] for all i. 


If S and T are Markov times for which S < T (every stopping sequence for T 
contains a stopping sequence for S), then 


(b) E[u(Xs)|Xo = i] = Elw(X7)|Xo = i] for all i. 


Proof. We need only prove (b) since (a) follows by taking S = 0. 
Let « be a constant, 0 < « < 1. Since u is r-superregular, the function h(i) = 
u(i) — a X; P;;u(j) is nonnegative. With obvious iterations we may write 
u(i) = h(i) + a }, Piju(j) 
J 
= h(i) + a X P hG) + a $ Paulk)] 
j k 
= hi) + æ X, Pish(j) + + Tt Y PE TAG) + a È Phu). (6-1) 
j j j 


Part (c) of Lemma 6.1 tells us that u(i) > }.; Piju(j) and we have further stipu- 
lated ||ul| < oo. It follows, as n > œ, that the last term goes to zero and (6.1) 
passes to the equation 


u) = È e X PRAC) 
= È EXX = i] 


= r| Š enx nlXo = i| (6.2) 


where h is a nonnegative function. 
Invoking the Markov property, we infer on the basis of (6.2) that for any 
lixed n 
u(i) = E[h(X „) F ah(X n+) + a7h( Xn +2) P |X, = i] 
= na hX) X, = i| (6.3) 
k=n 


Let T be the given Markov time. The event {T = n} is a union of events of the 
form {Xo = ips., Xn = ip} where (io, ..., i„) is a stopping sequence such that 
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T calls for stopping at n and not before. Let I,,(i) denote the set of these stopping 
sequences of length n for which iy = i. We have 


z| ¥ oth(X,)|Xo = i| 


=T 


co 0 
> È =| MIDI = fy X1 = boven ha =h 
T 


n=0 (io, ..., in) € In(i) k= 

x Pr{Xo = ig,..., Xn = i| Xo =i} (bythe law of total probabilities) 
0 oc 

JE z| È hX )IX, = | 

n=0 (io, «.., in) € In(i) k=n 

x Pr{X5 aak ig, ry Xun = inl Xo = i} 


oO 


a"uCin) 
n=0 (io, -s in) €In(i) 


x Pr{Xo = ig, ..., Xa = in| Xo = if (by virtue of (6.2)) 


=). ou, Pr{T = n and X7 = 4, | Xo = i}. 


n=0 in 


(We are able above to replace the Markov time T by the fixed time n since 
Xo =ig,..., Xn = i, implies T = n.) 

This last expression is a computation for the expectation of a’u(X r), and 
accordingly we have 


E[xTu(X 7)| Xo = i] = a| X &h(X,)| Xo = i| (6.4) 
k=T 
Since h is nonnegative, if S is a Markov time that does not exceed T, then 
X KX) = } a*h(X,). 
k=S k=T 
Now take the expectation of both sides and compare with (6.4) to deduce 


E[oSu(Xs)|Xo = i] = Ela™u(X7)|Xo = i]. (6.5) 


It remains to justify interchanging the expectation and the limit « > 1— in 
(6.5) in order to complete the proof. We next prove the limit relation 


lim E[a?u(X7)| Xo = i] = E[u(X7)|Xo = i]. 


awl 
Let I, be the indicator of an event A: 


L= 1 if A occurs, 
A ì0 if A does not occur, 
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Start with the identities 
Ela?u(X7)|Xo = i] = Elu(X7)|Xo = i] — ER — a )u(X7)|Xo = i], (6.6) 


and focus on the second term on the right. A standard partitioning and appro- 
priate analysis leads to the next succession of equations: 


EÇ — a? w(X7)|Xo = i] = EL — AUX Mo <r<aol Xo = i] 
= E[(1 — UX po crem|Xo = i] 
+ ERKI — a W(X p)Iucr<olXo = i] 
< (1 — lull + lull Pr{M < T < |X = i}, 


where M is an arbitrary integer. Let « increase to 1 so that (1 — «™)||u|| goes to 
zero for each fixed M. Therefore 


lim E[(1 — «7 )u(X7)|Xo = i] < llull Pr{M <T < «|X, = i}, 


asz1i- 
where M is arbitrary. Since 


lim Pr{M < T < |X, =i} =0, 
M- 0 


it follows that 
lim E[(1 — a? )u(X7)|X>o = i] =0 


avl— 
and applying this in (6.6) confirms 
lim E[atu(X7)|Xo = i] = Elu(X7)|Xo = i]. (6.7) 


gais 


The same argument works for S. Recalling (6.5) and using (6.7) produces the 
inequality 


E[u(Xs)|Xo = i] = E[u(X7)|Xo = il, 


which completes the proof. W 


This theorem leads to the following simple but important result in our study 
of optimal stopping. 


Theorem 6.2. Let u be any bounded vector satisfying, for all i, 
uli) >r) and ui) > Puy). (6.8) 
j 


Then for any Markov time T, 
u(i) > E[r(X7)|Xo =i]  foralli. 
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Proof. Since r is a nonnegative vector, the conditions of (6.8) affirm that u is 
r-superregular. Now application of Theorem 6.1 yields 


u(i) > E[u(X7)|Xo = il. 
But u(i) is never less than r(i) by (6.8), so that 
E[u(X7)|Xo = i] = EX IXs = il. 


The last two inequalities together give the desired result. W 


Remark. Theorem 6.2 remains valid when the nonnegative functions r and u 
are not bounded. To see this, create the bounded functions ry(i) = min{r(i), N} 
and uy(i) = min{u(i), N}, for an arbitrary positive N. Then uj(i) > ry(i) and 
uy(i) = 9; Pijun(/) by (a) and (b) of Lemma 6.1. Thus the theorem just proved 
applies and we conclude uy(i) > El[ry(X7)| Xo = i]. But u(i) = uy(i) so u(i) = 
E[ry(X7)| Xo = i]. Now let N increase. The left side is independent of N while 
the right converges to E[r(X7)|Xo = i], whence u(i) > E[r(X7)|Xo = i] as 
claimed. 

Any vector u satisfying (6.8) is called an r-superregular majorant of r(i). 
Theorem 6.2 tells us that any r-superregular majorant of r(i) bounds the optimal 
reward vector v, where 


v(i) = sup E[r(X7)|Xo = il, (6.9) 


where the supremum is extended over all Markov times T. Suppose there is 
available a Markov time T* such that 


wii) = E[r(X 7») |Xo = i] (6.10) 


is an r-superregular majorant of r(i). Then T* is optimal, i.e., achieves the 
maximum expected reward, because 


ui) = E[r(Xr9)|Xo = i] 
by (6.9), while Theorem 6.2 implies 
w(i) = E[r(Xp)|Xo = i] 
> v(i). 
Hence 
v(i) = w(i) = E[r(X7)| Xo = i] 


and T* is optimal. This program often provides an expeditious method for 
verifying that a given Markov time is optimal. We illustrate with some concrete 
examples. 


Example 1. A correct answer for a contestent in a quiz show wins him a dollar 
and gives him the option of leaving with his total winnings to date or attempting 
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another question. On the other hand, with a wrong answer, he forfeits his 
cumulative winnings and must quit. What strategy maximizes his mean reward? 

Let us suppose the questions are answered independently, each correctly 
with known probability p and incorrectly with probability q = 1 — p. We 
let X,, denote the cumulative winnings up to trial n, and introduce an artificial 
state A to represent “out of the contest.” Then Xo, X,, ... is a success run 
Markov chain with Xo = 0 and 


xy = X,+1 with probability p, 
Bat A with probability q = 1 — p. 


With the reward function r(x) = x for x = 0, 1,... and r(A) = 0 the problem, 
stated formally, is to find a Markov time T that maximizes E[r(X7)|Xo = 0]. 

Intuitively one might guess that the optimal strategy would be to stop the 
first time the winnings exceed some critical value €. That is, we should examine 
Markov times of the form 


T(¢) = min{n: X, > čor X, = A}, CS is 
Easily, the mean reward under such a rule is 
v(i) = E(X rol Xo =i] 
_ fé for i<G 
li for i> é, 
and vA) = r(A) = 0. 
Now considering only Xo = i = 0 and looking at the ratio 
ve 10) _ (E + Dp! 
v0) cp 


we see it pays to increase ¢ as long as vz, ,(0)/v(0) > 1. The optimal choice for 
č is, then, 


* = min{é: p(č + 1)/¢ < 1} 
= min{č: pë + p <8 
= min{é:¢ > p/(1 — p)}. 
In words, €* is the smallest integer greater than or equal to p/(1 — p). 

We claim T* = T(é*) is optimal. According to the comments preceding this 
example all we need show is that v(i) = v(i) is both r-superregular and nowhere 
less than r(i) = i. Checking the second claim, v(i) = r(i) = i for i > €*, while for 
i < &*, by virtue of the definition of €* we have p > i/(i + 1) and 


ri) =i 
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which verifies v(i) > r(i) for all i. Returning to the first claim, if i > ¢*, then 
vi) = i > peli + 1) = pli + 1) 
by the definition of €*, while if i < &*, 
oi) = Sp"! = poi + 1). 


Thus v(i) is r-superregular. 

Completing the argument, v(i) > E[r(X7)|Xo = i] for any Markov time 
T, by Theorem 6.2, but v(i) = E[r(X 7+)| Xo = i] by construction. Therefore 
T* = T(é*) is optimal. 


Example 2. As a second example, let Xo, X,,... be a Markov chain having as 
its state space the set of integers 0, +1, +2,... and its transition probabilities 
Pict =P, 
Pii-1 =Q=1-p, 


We suppose p < 4 so that X,, is a random walk having a negative drift. With 
i i if i20, 
nS f if i<0, 


we desire a Markov time T that maximizes E[r(X7)|Xo = i]. 
Again, it seems reasonable to suppose that the optimal strategy would 
assume the form 
T(a) = og ‘X, >a} if X,>a forsomen, 
if X,<a_ for alln, 


for some optimally chosen integer a. Consequently, we evaluate 


v(i) = E(X ra)|Xo = i] 
M f Pr{T(a) < ©|Xo =i} i<a 


i, i>a 


a Pr{max X, > a|Xo = i}, i<a 
i iza 


a-i 
= a Ese (See Section 3 of Chapter 3.) 


i, iza 


6. STOPPING RULE PROBLEMS 59 


Looking at the ratios v,, ,(0)/v,(0), an argument analogous to that in the 

previous example shows the maximizing a = a* is given by 
a* = min{a: p(a + 1) < qa} 
= min{a:a > p/(1 — 2p)}. 

Again we claim T* = T(a*) is optimal. As before, we need only show 
v(i) = v(i) is both superregular and not less than r(i). Beginning with the 
second claim, v(i) = r(i) = i for i > a*, while for 0 < i < a*, by virtue of the 
definition of a* we have p/q > i/(i + 1) and 


ri) =i 


i 

x x: 
i+1 i+2 a* 
a* 


(y= 


Returning to the first claim, if i > a*, then 


v(i) = i 
> pli + 1)+ qi-1) (because p < q) 
= poi + 1) + qui — 1), 


f p a*¥—i 
v(i) = a* {= 
(i) 6) 
a*—(i+ 1) a*—(i-—1) 
p [P 
= pa*|Ł + qa*(Ł 
á () 4 () 


> peli + 1) + qui — 1). 


i+1 7 a* — 1 


< 


while if i < a*, 


Thus v(i) is r-superregular. 

Completing the argument, by Theorem 6.2, v(i) > E[r(X7)| Xo = i] for any 
Markov time T, but we derived v(i) from v(i) = E[r(X 7+)| Xo = i]. Therefore 
T* = T(a*)is optimal. 


These two small examples illustrate a general procedure that can be applied 
in a remarkably large number of problems. The trick is to guess the optimal 
strategy and then to use Theorem 6.2 to verify that it is indeed optimal. When 
this shortcut to an answer cannot be taken, the more general approach that we 
now develop must be used. 

A vector fis called the least r-superregular majorant of r provided (i) fis an 
r-superregular majorant of r and (ii) if h is any other r-superregular majorant of 
r then AD 2 f'(i) for all 7. 
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There is at most one least r-superregular majorant of r. Suppose to the 
contrary there were two, say f, and f,. Then the vector f defined by 


f@ = min{ fi), AW} for all i, 


would be an r-superregular majorant of r by Lemma 6.1(b). And f (i) would be 
strictly smaller than some f;(i) or f2(i), thus forcing a contradiction, unless 
AÀ = fa =f@ for all i. 

Next we prove that at least one least r-superregular majorant of r exists. 
Let H be the set of all r-superregular majorants of r. Observe first that H is not 
empty. Indeed the constant vector h*(i) = ||r|| for all i patently belongs to H by 
Lemma 6.1(a). Define the vector f by 


fÒ = inf{h@):heH} for alli, 


whence f (i) > r(i) since each h(i) exceeds r(i), and fis r-superregular by Lemma 
6.1(b). Thus fis an r-superregular majorant ofr, and it is the smallest such vector 
by its very definition. 

The least r-superregular majorant f can be calculated by successive approxi- 
mation. Define the vectors fo, f;,..., recursively by 


foi) = r(i) foralli, 


and 
frvi@ = marr 2 Pano} (6.11) 


Then for every n, f,(i) > r(i). By induction we shall show 


far Z hÀ for all i. 
Easily 


fi@= marr 2} Pij ro} = r(i) = HÀ. 
Suppose f,(j) = fa- (j) for all j. Then 
È Py fd) = È Pi fa-10) 
and a fortiori 
fari) = maxo, 3 P; so} > x Pi fx—1()- (6.12) 
But f+ (i) = r(i) so that, together with (6.12), 


Ini D2 marr Pinte o} = Jali). 
1 
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Thus, f,(i) is an increasing sequence in n for each i and we may define the limit 


foli) = lim f,@). 


n> oc 


It will later emerge that f,,(i) is finite, indeed bounded. We claim that f, (i) is the 
least r-superregular majorant of r(i). First f,,(i) > r(i) holds since f,(i) > r(i) 
for each n. Since f,,(i) > f,+1@, from the definition (6.11) we have 


fli) = lim È Py K(D) 


n> 0 j= 


n> o j= 


N N 
> lim ÈP fii) = È Pi faQ), 


where N is an arbitrary positive integer. Letting N increase indefinitely, we 
conclude that 


fol) > F Py fal 


Thus f, is an r-superregular majorant of r(i). To show it is the smallest, let h 
be an arbitrary r-superregular majorant of r. Then 


kÒ = r@ = fo for all i. (6.13) 


We next verify inductively that h(i) > f,(i) for each n. The case n = 0 was checked 
above. Suppose h(j) > f,—1(/) for all j. Then )); PhO) = Yj Pi fr-1/), and 
since h is r-superregular, h(i) > $. j Pa h(j), so that 


Wi) > F Py Sus. 


But this last inequality in conjunction with (6.13) shows that 


h(i) > malro, È Pif- o} = fii). 
Hence, h(i) > f,(i) for all n and consequently 


h@) = lim £,@ = fi). 
n> oo 
The claim that f,, = fis the least r-superregular majorant of r(i) is hereby vali- 
dated. 
The next result begins the characterization of the optimal mean return 
function. 


Lemma 6.2. Let fbe the least r-superregular majorant of a nonnegative bounded 
vector r. For each e > 0 let L, = {i: f@ < r(i) + e} and let T(e) = min{n: 
xX, ET}, (T(s) = % if X éT, for all n). Then for every e > 0 


SO = e s EDX) Xo = i] sf. 
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If the Markov chain has a finite state space, then 

SO = EX ro) Xo = i. 
Proof. Prescribe £ > 0. Define 

fA) = ELf(X1@)|Xo = i. 


We first want to show f,(i) = f (i) for all i. Since fis an r-superregular vector, by 
Theorem 6.1, 


SO = ELf(X7)|Xo = 1] = f®. 


Thus, we need only prove f,(i) > f (i). This will be accomplished if we can show 
(i) AO = r@ for all i and (i) AO = ¥;; Py fQ), since f is the smallest vector 
satisfying (i) and (ii). For (ii), let T’ = min{n:n.> 1 and X, €T,}. (Note, in the 
definition of T’, n is restricted to exceed 0.) Then T’ > T(e). Thus, by Theorem 
6.1, 


IEO) = ELX Trw) Xo =i] 
> ELf(Xr)1Xo = i]. (6.14) 


Now, using the Markov property, 
El f(X7)|Xo = i] = > PEL f(Xp)|X4 = ji. 
J 


But 
E(f(X7)|X1 =j] = ELf(X1@)|Xo0 =j] 
= fi). 
Taken together we have shown 
LÒ = Y Py LD. 


Now to show (i). Suppose c = sup,{r(i) — fD} = 0. Then (i) + c is r-super- 
regular, since f, is by (ii) and a constant vector is regular. Moreover f,(i) + c 
> r(i) so that f, + c exceeds the least r-superregular majorant of r(i); that is, 


f@t+czf@ for all i. (6.15) 


Take « satisfying 0 < « < €. Since c = sup;{r(i) — fÐ}, there is a state iy 
for which 


rio) — fio) > c — a. (6.16) 
Because of (6.15) 
0 < fio) — rio) < filio) + ¢ — rip) <a < e. 
Thus, ip € T, and consequently 
Adio) = ESX rel Xo = to] = f (lo). f 
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According to (6.16) 
c — a < rio) — f (io) < 9, 
or 
c <a. 


But « is an arbitrary number in (0, €). We send « to zero, implying c < 0. That is, 
JÐ = r(i) for all i. This verifies requirement (i) attesting that f, is an r-super- 
regular majorant of r(i). Hence f,(i) > f (i) for all i since fis the least majorant 
of r(i). But we already pointed out the relations f,(i) < f(i) for all i. Thus f, = f, 
and 
f@ = ELf(X1@)|Xo = i]. 

But note that X rœ ET, so that f(X ræ) < r(X ræ) + £. Therefore 

f@ < EX Tto) + €|Xo = i] 

< ELX 7)|Xo = i] + E. 

Thus 


fO — £ < EX to)! Xo = i] < fÒ, 
as was to be shown. 


When the state space is finite the same proof will work with ¢ = 0. We leave 
this to the reader. 

A key by-product of this theorem is the result that v = f, that is, that the optimal 
reward vector is the smallest r-superregular majorant of r. This warrants a formal 
statement that offers an additional characterization of the optimal mean return. 


Theorem 6.3. The following are equivalent: 


(a) v(i) = sup E[r(X7)|X_ = i], where the supremum is over all Markov 
times T; 


(b) v(i) = liM, o fai) = sup, f,@ where fo(k) = r(k) for all k and 
Sn+ilk) = mar, È Pij no}, for all k; 


and 
(c) v(i) = inf h(i) where the infimum is over all vectors h satisfying, for all i, 
h(i) > r@ and h(i) > X P,;h(). 

j 


3 


Proof. We have already shown that (b) and (c) are equivalent since (c) de- 
cribes the least r-superregular majorant of r and (b) gives an algorithm that leads 
to it. Comparing (c) and (a), if v is the least r-superregular majorant of r then 
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v(i) > Ef[r(X7)|Xo = i] for all Markov times T by Theorem 6.2 while, given a 
positive s, we have v(i) — e < E[r(X 7))| Xo = i] in the notation of Lemma 6.2. 

Since there exist strategies having returns arbitrarily close to v(i) we have 
v(i) = sup E[r(X 7)| Xo = i], and (a) and (c) are also equivalent. Hi 


We have characterized the maximal mean reward but said nothing concern- 
ing optimal strategies. Unfortunately, there may not exist optimal rules. As 
an example, suppose P;;,, = 1, i= 1, so that the Markov chain moves 
deterministically through successive states. Take r(i) = 1 — (1/i). Then v(i) = 1 
for all i, but there is no Markov time that will attain this expected reward. 

Nevertheless we can show that if there exists an optimal Markov time, then 
T(O) = min{n:r(X,) = v(X,)} is optimal. 

Theorem 6.4. If there exists an optimal Markov time T*, e.g., 
vi) = E[r(X y+)| Xo = i] for all i, (6.17) 


then T(0) = min{n: o(X,) = r(X,)} is also optimal. 
Proof. Recall that v(i) > r(i for all i. Hence, from (6.17) 

vi) = E[r(Xp)|Xo = i] 

< E[v(X7»)|Xo = i] < vf). 
That is, 
E[r(X7«)|Xo = i] = Elv(Xp)|Xo = il. 

Since v(i) > r(i) for all i we must have 

Pr{r(X r) > (X p)| Xo = i} = 0, 
and 

Pr{r(X p*) = (XlX = i} = 1. 


That is, Pr{X7+€T9|Xo = i} = 1. But T(0) is the first n, if any, for which 
X„ETo. Hence T(0) < T*. From Lemma 6.1 


E(X 70))| Xo = i] = Elv(X7p)|Xo = i] = vii). 
But X70) € To So that AX ro) = MX To). Hence 
ELX no) Xo = i] 2 (ò. 


This shows that T(0) is optimal. 
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Elementary Problems 


1. Consider an irreducible positive recurrent Markov chain with initial state Xo = i. 
Let N,,(i) be the number of visits to state i in the first n trials. Prove that 


_ ELN,(i)] _ 1 
lim ———— = 


n> oo n Hi 


where p; is the mean recurrence time to state i, i.e., p; = V2, nf. 


2. Consider an irreducible but not necessarily recurrent Markov chain. Assume that 
oPõo = 1 and oP; < œ (i > 1). Show that the sequence oPğ¥; is l-superregular. 


3. Consider the gambler’s ruin on n + 1 states, where the transition probability matrix 
was given in Chapter 3. Find all r-regular vectors relative to this Markov chain. 


Solution: u(i) = an(Co) + br;(C,„), a, b arbitrary, using the notation of the example in 
Chapter 3. 


4. Let {U)}.o be a real solution of the equations 

$ PU, — U; = 1, i =0)-1,.2; 0.25 

j=0 
Let E[Uy,|Xo = i] denote the expected value of the r.v. Uy, with the initial condition 
Xo = i. Prove that \ 


li E[Ux,|Xo =i] _ 
 —— E A 


n> © 


1. 
n 


Hint: Establish the identity P29 PU; — U; =n,i=0,1,2,.... 
Problems 


1. Consider an irreducible positive recurrent Markov chain with initial state Xo = i. 
Let N,,(i) be the number of visits to state i in the first n trials and let T,,(i) denote the number 
of trials until the mth visit to state i. Justify the relationship 


Pr{T,,(i) < n} = Pr{N,(i) > m}. 


2. Assume È2 ; nf, < œ. Since T,,(i) is a sum of m independent, identically distributed, 
1undom variables with mean u; and variance g; use the central limit theorem and the 
relationship given in Problem 1 to develop a limit distribution for N ‚(i) properly normalized, 
Le, find a, > 0 and b, > O such that (N,(i) — a,)/b, has a limit normal distribution. ” 


A. Let {X n = 0} be an irreducible recurrent Markov chain with transition probability 
matrix P = ||P] and generalized invariant measure {v;}, ie, X, oP = vj, vj > 0, j= 
OL 1... 
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Let {Y,, n > 0} be the imbedded process obtained by looking at the Markov chain only 
at times when X,, = 0 or 1 (i.e., Yo = X,,, where no is the first time the chain is in state 
Oor 1; ¥,, = X,,,, where nm is the time of mth return of the chain to states 0 or 1). Then 
{Y,, n > 0} is also an irreducible recurrent Markov chain. Let wo, w; denote the stationary 
probabilities of the embedded Markov chain. Show that w,/wo = v/o. 


Hint: Use the interpretation of v; as given in Theorems 3.3 and 3.4. 


4. Let P = ||P] and P, = ||P,(™l|l, n = 1, 2,..., be transition probability matrices of 
irreducible recurrent Markov chains and let {v;} and {v™}, n = 1,2,..., be the correspond- 
ing invariant measures normalized so that vp = v = 1 for all n. 

Prove that if P;,(n) > P; for all i, j as n > œ then vP —> v; for all i. 


Hint: Prove that lim,..,, v¥” = w; exists and satisfies 


wi Pi < Wj, j=0,1,2,..., 


T 
&Ms 


and wọ = 1. Then use the uniqueness property of Theorem 5.3. 


5. In an irreducible Markov chain prove that any nonnegative r-superregular sequence 
{u(i)} has the property that 


u(i) > fz u(k) for every i and k, 
where f$ is the probability of reaching state k from state i. 


Hint: Consult the proof of Theorem 5.2. 


6. Let {u(i)} be a finite nonnegative r-superregular sequence. Define 
w(i) = ul) — F, Pyuli). 
J 


Show that the set A of all states i for which w(i) > 0 is a set of nonrecurrent states. 


Hint: Modify slightly the conclusion of Problem 5 for the case at hand to deduce strict 
inequality and apply the result appropriately. 


7. Consider an irreducible Markov chain. Fix some state; call it 0. Prove that iff > « > 0 
for every i 0 then the chain is recurrent. 


Hint: Show that the probability of visiting state 0 only finitely often has probability zero 
(cf. Theorems 7.1 and 7.2 of Chapter 2). 


8. Show that an irreducible Markov chain is nonrecurrent if and only if there exists a 
finite-valued /-superregular sequence u(i) such that 


ulk) >}, w)P, for some state k. 


Hint: (sufficiency) Use Theorems 5.3, 3.3, and 3.4; (necessity) consult Eq. (5.5). 


9, Consider an electric circuit with m boundary and n interior positions denoted by 
By, Bayes By and Ay, Agee. Ap respectively, as in the accompanying diagram, 
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Thus there are m + n positions altogether. The current flowing from position i to j is 
Tj; and the resistance between i and j is R;; when i and j are not both boundary positions. 
The potential at position iis V,. Assume that the resistances are all known and the potentials 
at boundary positions are also given. Ohm’s law states that 


(i) ' Vi- Vj = Ry Ti 
and Kirchoff’s first law asserts that i 
(ii) SH = 0: 
j 
Using (i) and (ii) prove that V; satisfies the relations 
-1 
(iii) v% = z rz) DMR 


By interpreting (iil) show that we may consider a random walk on the positions of the 
electric network, interpreting the boundary positions as absorbing states and defining 
transition probabilities as appropriate expressions of the resistance R;j. 


10. In Problem 9 order the states of the random walk in such a fashion that states 1, 
2,..., mare all the boundary positions and states m + 1,m + 2,...,m + nare the interior 


positions. Prove that the potential at an interior position A; may be expressed as 


m 


Vasi = dba for i=1,2,...,n, 
1 


k= 


where by, is the probability of absorption at the boundary position B, having started at the 
interior position A;. 


1L Let || P,;|| be a transition probability matrix of a recurrent null or transient irreducible 
Markov chain. Let {u;} be a /-regular positive vector and let A be a collection of states. We 
mtroduce the notation PH = $ jea PP. Prove that if (A) = Vie4 u; < © then PH 0 
asa S 0, 


Hin: Let B bea finite subset of A such that Diea nu, < & Show that Pf, < 6/u, and 
use this fact. (The notation A — B designates the set of states in A excluding the states of B.) 
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12. Let ||P;;|| be the transition probability matrix for a Markov chain with an infinite 
number of states. Assume P;; = Oforj > i+ 1,and P; i+; > 0 foreach i. Define a collection 
of polynomials Q,(z) recursively by the relations 


Qo(z) = 1, 
20(z) = Pi,i+ 1Qi+1(2) + X P;;Q;(2)} i=0,1,2,.... 
j=0 
Let T; be the first passage time into state j + 1 starting from state j, and let f(z) = ) m1 


Pr(T; = n) z” be the generating function of T}. Show that 
S = Q,(1/z)/Qj+ (1/2). 


Hint: Derive a recursion formula for Pr{T; = n} considering the possibilities of what 
happens at the first transition. Then pass to generating functions. 


13. Prove the following identities for three states in the same positive recurrent class 
GEk): 


(a) Mig + Myj — Mij = f Bly, + Mj) 
©) Milh 
Ma f kj 


where my, = Yoi nfk- 


14. Let Y,, Y,,... be independent identically distributed integer valued random variables 
having negative mean u = E[Y,] < 0 and finite variance. Consider stopping the Markov 
chain Xo =i, X, = X,-1 + Y% = Xo + Y, +--+ + Y, under the reward function 


-fk Eko 
9 =o if k<0. 


Define M = max{0, Y,, Y, + %,...}, A = ELM], and 
v(i) = ER + M — A)*], 


where x* = max{x, 0} is the positive part of x. 
Show (a) v is an r-superregular function; (b) v(k) > r(k) for all k; and (c) 


vi) = EMX rw)l Xo = i, 
where 
T(A) = min{n: X, > A}, 
a aie Oe, X,<A_ foralln. 
Conclude that T(A) maximizes E[r(X 7)|Xo = i] over all Markov times T. 


Hint: For (a) and (b), use the fact that M has the same distribution as (Y + M)*, where 
Y, independent of M, has the same distribution as cach ¥,. 


15. Let X, be a Markov chain taking possible values in {a,a + L ..., b}, where a and b 
are absorbing states, and Pri Xp, = 1 + 1X, =f} = Pr{X,,, =i 1X, = i) = | for 
a<i< h, Let a nonnegative reward vector e(d) = 0 be given with r(a) = rb) = 0. 
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Show that the least right superregular majorant of r is the smallest concave function f 
that exceeds r. If adjacent points on the graph (i, f (i)) are connected, the result is the convex 
hull of the points (i, r(i)). 


16. An urn contains n distinct numbered balls. We have no knowledge of what these 
numbers are. We will draw the balls one by one from the urn, and halt when we think the 
ball just drawn from the urn is the largest of the n numbers. The object is to maximize the 
probability of being right. 

(a) Consider strategies of the form: Draw the first m balls; then halt as soon as a ball 
is removed with a number larger than any of the first m numbers. What is the probability 
of being right? 

(b) Choose p between 0 and 1 and let m = [np] (the greatest integer in np). Which 
value of p is asymptotically best as n > oo in the sense of maximizing the probability of 
being right? 


17. Let P = ||P;;|| be the transition matrix and x = ||z,|| be the stationary distribution 
associated with an irreducible positive recurrent Markov chain. Show that there exists a 
finite constant C for which 


aise for all n, j 
T) 

and thus 

È [Pi — Bl < oœ. 

j 


Show 
È È Po; — Bl < o, 
n j 


provided Xp 17f}9 < ©. 


18. A Markov chain {X,,} whose transition matrix is P = ||P;,|| is said to be symmetrically 
reversible with respect to the vector a = ||a;|| if Q = P (where Q;; = «; P;;/a; as in (4.2)). Let 
{X„} be an aperiodic, irreducible nul? recurrent Markov chain having the generalized 
stationary distribution a, where «; > Ofor all i. Show that, if {X,,} is symmetrically reversible 
with respect to a, then the following ratio limit holds: 


19. Let {X,} be an aperiodic irreducible Markov chain on the states 0, 1, ... and with 
transition matrix P = ||P;;|. Assume the following properties: (i) Po, = 1 and (ii) 
EVI Xo = i] < œ for all states i, where Ty = min{n > 0|X,, = 0} is the hitting time to 
state 0. Let Q be the matrix whose elements are P;;, but restricted to i, j > 1. Thus P has 
the form 

0 1 0O > O 


Pio Qu Qin >) Qij 
Poo Qai Qn +S Q2; 


Pio Qi Qn ds Qij 
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Let N be the matrix with elements Nj; = Xoo oP? = Vo QW for i,j > 1. 

(a) Show that N is the minimal right (and left) nonnegative inverse of I — Q, where I 
is an infinite-dimensional identity matrix. 

(b) Show that the stationary distribution « = |\a;|| of {X,} is proportional to 
(1, Niis N12, - --) by (i) using an algebraic argument involving (a) and (ii) using a renewal 
argument. 

(c) Age distributions. Motivated by the current age problem in renewal theory (see 
Problem 10 of Chapter 5), given that X,, = j > 0, we say the process is ofagek if X,_, = O but 
X,-1 # 0 for 0 < | < k. Determine the limiting age distribution 


g® = lim Pr{X,_, = 0, X,_, 4 Ofor0 <1 <k|X,=j}, k21. 
Answer: gi? = oP Y/N, ),k > 1. 
(d) Find the mean limiting age given X, = j. That is, determine p>; kg". 
Answer: X o1 Ny,N,/N13- 


(e) Let L(j) be the last time that the process visits state j before hitting Tọ, with 
L(j) = œ should these be no such last visit. Then L; = k if and only if Tọ > k, X, = j, and 
X, #jforn=k + 1,..., To. Show that 


Pr{L; = k|Xo = 1} = Pr{Lj < œ| Xo = lpP/Ny,, k20. 


20. Let {X,} be a Markov chain on the states 0, 1, ... with transition matrix P = ||P;;ll. 
Assume that Poo = 1 and that E[To|Xo = j] < œ for all states j where Tọ = min{n > 0: 
X „= 0} is the hitting time to state 0. Let Q be the matrix whose elements are P;, but re- 
stricted to i, j > 1. Thus P has the form 


1 0 0 
P= Pio Qu Qi 


P20 Q21 Q22 


Let N be the matrix with elements N; = } g-o QS fori, j > 1. Next, let {X,,} be the Markov 
chain derived from {X,} by restarting at state 1 whenever state 0 is reachéd. Then {X,,} has 
transition matrix with elements Py, = 1 and P,, = Pj, fori > 1,j > 0. 

Let {¥,} be the time reversal of {X,,} with respect to the stationary distribution in 
Problem 19(b). 

(a) Give the transition probabilities for {Y,}. 

(b) Let {Y,} be the process derived from {Y,} by making state 0 absorbing. Show that 


Pr{Tp = n| Yo = j} = PẸ P/N4;, n>1, 


where Tp is the absorption time for {Y,}. (Compare with Problem 19(c).) 


NOTES 


This chapter serves merely as a bare introduction to an important and expanding area in 
probability theory. 

A useful book on the subject of potential theory in Markov chains is the compre- 
hensive work of Kemeny, Snell, and Knapp [2]. 
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Bibliographic and historical references on potential theory and boundary theory for 
Markov chains are found in the above work. 

A very modern approach to the study of Markov processes via potential theory is 
contained in Meyer [3]. 

Dynkin and Yushkevich [5] have written a nice little book that covers aspects of both 
Markov chain potential theory and optimal stopping. 

A treatment of Brownian motion and certain stable processes in N dimensions with 
emphasis on potential theoretic concepts is given in Port and Stone [4]. 

Various classes of optimal stopping problems and gambling systems as part of mathe- 
matical statistics are covered in Chow, Robbins and Siegmund [6]. 
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Chapter 12 


SUMS OF INDE NDENT R 
VARIABLES AS A MARKOV 


This chapter is a simplified version of a few topics from the theory of sums of in- 
dependent random variables. Some additional results on sums of independent 
random variables appear in Chapter 17 on fluctuation theory. 


1: Recurrence Properties of Sums of Independent Random Variables 


Let X,, X2, ... be a sequence of integer-valued, independent, identically 
distributed random variables and define S, = X, +X, +--+ X, for 
n = 1,2,.... For completeness we also define Sọ = 0. 


In this chapter we discuss some aspects of sums of independent random 
variables S,, n = 0, 1, ..., regarding them as successive values of a discrete 
valued Markov chain of special structure. We barely touch the surface in our 
treatment of the extensive theory of sums of independent random variables 
as much of it is beyond the level of this book. For a more complete account of 
this elegant and rich theory we refer the reader to Spitzer (see references at the 
close of this chapter). 

In Example A, Section 2, Chapter 2, we mentioned the sequence S,, (where 
X; are nonnegative integer valued) as an example of a Markov chain. In the 
present context the state space consists of all integers; positive, negative, and 
zero. The initial state is prescribed to be zero since we set Sọ = 0. The special 
feature of the Markov chain {S,} is its spatial homogeneous character in that the 
one-step transition probabilities have the property Pr{S, =j|S,-; =i} = 
Pi; = Po, ;-; = Pi-j,o- A simple induction shows that the same is true for the 
n-step transition probabilities; thus for any n > 1 


Pi = PO. = Pi-j.0 = Pr{Syo4 = jls = i} 


We will assume in effect throughout this chapter that the random variable 
X i is “irreducible.” By this we shall mean that the Markov chain with transition 
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probability matrix P;; = Pr{S, = j|S,-; = i} is irreducible. Simple criteria 
which guarantee irreducibility will be indicated later. We will also stipulate, 
without repeating this at every occasion, that X, is nondegenerate, i.e., it has 
at least two possible values. 

We now determine some simple conditions for recurrence and transientness 
appropriate for the Markov chain generated by {S,}. For this purpose we 
introduce the following quantities 


3 


< +0, 


a 


foe} 
Gi; = 2E ip Gi = LP i 


for all integers i, j and n = 0, 1, 2,..., The expression G;; is called the “Green 
function” and relates to the potential theory development alluded to in Chapter 
11. 


Lemma 1.1. 
Gi; < Goo, WO; Ty 2s. cent (1.1) 
for all integers i, j. In particular, as n > 0 we have 
Gij < Goo (1.2) 


for all integers i, j. 


Proof. Because of spatial homogeneity we have Gj; = Gj_j;,9. Thus it is 
sufficient to prove Gio < Go for all integers i = 0, +1, +2,... and n = 0, 
1,2,.... But 


n n m n n n n-l 
= DS) 2 So Pho= X Pho} eS } Pho? fio 
m=0 m=0 1=0 1=0 m=l 1=0 r=0 


where fio is the probability of reaching 0 from i for the first time at the rth step. 
Since 


n n ‘ 
Y fio <1, Gio < È Pho = Goo. W 


~~ 
i] 
© 
~ 
Il 
© 


An elegant and useful criterion for recurrence is the content of the following 
theorem. 


Theorem 1.1. If 


E[|X; |] = ELIX I] = f by li] Poj < œ, k = 2,3,..., (1.3) 


j=-% 


and 4 
u = E[X,] = E[X,] = > jPoj = 0, (1.4) 


j=- œ 


then the Markov chain {S,} is recurrent. 
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Remark. Since E[X,] = 0 and X, is a nondegenerate random variable, we 
infer that there are positive and negative values that X, may achieve with 
positive probability. The irreducibility postulate imposed at the start asserts 
that the Markov chain generated by S,,, n = 0, 1, 2,..., is irreducible (consists 
of one class) and its state space comprises all integers (positive, negative, and 
zero). Therefore, in accordance with Corollary 5.1 of Chapter 2, it is enough in 
verifying the recurrence property to establish recurrence for a single state (say 
the zero state). 


Proof. Now from (1.1) we know that Go; < Goo, for all integers j. The same 
inequality is preserved by averaging. Hence 


But 
M n n n 
Y G= } LPo=} } Poz} LY PH. (16) 
j=-M lji <M m=0 m=0 |j|<M m=0 |j/m|<M/n 


and the last inequality holds trivially since m < n. Comparing (1.5) and (1.6) 
we see that 


1 n 
Go == pt. 1.7 
00 2M $ 1 2 eres Oj ( ) 
Now, by definition, 
oj = Pr{S,, = j|So = 0}. (1.8) 


Since S, is the sum of k independent identically distributed random variables 
with finite mean, u = E[X,] = E[S,] = 0, the weak law of large numbers 
prevails (see Section 1, Chapter 1, page 19), which asserts specifically that for 


any prescribed ¢ > 0, 
Prf Semel oh = pi Sel celai, as m> œ. (1.9) 
m m 


From definition (1.8), we have 


Pr{|S,,|< me} = Y} PB. 


[j| < [me] 


(Here [h] designates the greatest integer which does not exceed h; so h — 
1 < [h] < h.) The limit relation (1.9) can be expressed equivalently in the form 


H,,(s) = L oy |, as mo, (1.10) 
IJI [a] 
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Now, choose M = [ne] in (1.7), where ¢ > 0. Then 


1 z 1 R 
G”, > ENESA m,y E Po. 
te 2[ne] + 1 2 ee a 2[ne] F 1mo lil ae = 
1 1 z 
"= È Ane). (1.11) 


~ ne] +1 n+ 1 amo 


It follows from (1.10) that 


n 


ae 2 Hn) =i as n> œ. (1.12) 
Further, 
n+1 ; n+1 1 


lim ——_—— = a eS 
en 2[ne] + 1 eee 2ne +1 2e 
From (1.11) and (1.12) we conclude that 


1 
lim Gbo = =>. 
ee ee 
Since £ > 0 may be chosen arbitrarily small, we have shown that 


| a Fi no 
Póo = lim Goo = 
n>o 


its 


k 


Finally, consulting Theorem 5.1 of Chapter 2 we recall that Gog = œ is equiv- 
alent to the assertion that the zero state is recurrent. W 


Notice that we did not use the full force of the hypothesis that X; has finite 
mean value. Scrutiny of the proof reveals the fact that the conclusion of the 
theorem remains valid provided merely that the weak law of large numbers is 
applicable as expressed in (1.9). 

In the following theorem, which is a partial converse of Theorem 1.1, the 
existence of a mean is more decisively used. 


Theorem 1.2. If 


BUA MS dig, i=1,2..., (1.13) 
jr-o@ 
and 


oO 


H = E[X] SND JPoj #9, 


=—- oO 


then the Markov chainXS,} is transient. 
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Proof. Let A, denote the event {S, = 0}. 
We recall the criterion of recurrence in the form 


1 if and only if the 
Markov chain {S,,} 
is recurrent, 

0 if and only if the 
Markov chain {S,} 
is transient 


Pr{A,, occurs for infinitely many n} = (1.14) 


(see Theorem 7.1 of Chapter 2). 
The proof of Theorem 1.2 makes use of the strong law of large numbers (cf. 
Section 1, Chapter 1), which states that 


Pr} im S u} =j; (1.15) 


no 


Now if u 4 0 we consider the events 
S 
u > H, n= L283 


o- {E- 
n 

and let C be the event that C,, occurs for infinitely many n. We will evaluate 
Pr{C}. Any realization of the process for which lim,.,, S,/n = u obviously 
cannot belong to the event C. But according to (1.15) the realizations of the 
process fulfilling lim, o S,/n = u have probability 1. Therefore Pr{complement 
of C} = 1 or Pr{C, occurs for infinitely many n} = 0. But plainly the event A, 
implies the event C,, i.e., A, < C,. Hence 


Pr{A, occurs for infinitely many n} < Pr{C} =0 
Taking cognizance of (1.14) we conclude that the Markov chain {S,} is transient. 
e E 


2: Local Limit Theorems 


Note that if the Markov chain {S,} is recurrent, it can only be null recurrent. 
This is so because, by spatial homogeneity, we have 


mt; = lim Pi = lim P§o = tý; for all i. 


n> oo n7>o 


Thus, zy) > 0 would imply È £ -o 2; = 00, which is impossible. Hence, n; = 0 
for all i. 

Since the Markov chain {S,} is always either transient or null recurrent, 
we know that for all j 


Oy 7? 0, as n> om, 
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It is of some interest to determine the rate of convergence to zero. Such a result 
is referred to as a local limit theorem. To analyze this problem we introduce 
the characteristic function 


o(0)= > Po, exp(iv0) = E[exp(iX,6)], k=1,2,..., 


—n<@0<7, (2.1) 


where the series converges absolutely and uniformly. We claim that 


[KO] = > P}, explivð),  —n<0< n. (2.2) 
This is so since the X,, k = 1,..., n, are independent, identically distributed 
random variables and thus 


A ; / 
Po; = Pr{S, = Jj}, 


z Po, exp(iv0) = E[exp(iS 0] = Elexp(i0(X, + --- + X,))] 


v=- © 


= [] ETexpGOX D] = T] ToO = Tor ON. 


Note, further, that 
Lr fo if j#k 
i(j— ke = i= = 
e dé -= ETSI (here i = ./—1), (2.3) 


i.e., the functions (27)~ '/e'** (k any integer) are orthonormal. Thus, when we 
multiply both sides of (2.2) by (27)~te7™? and integrate with respect to 0 over 
[— z, x] only the term v = k remains on the right and this gives the formula 


PO. = (27) f eT o(0)]" dé. (2.4) 


Before we can proceed to state and prove the result concerning the rate of 
convergence of Po, to zero as n > œ, we introduce some additional concepts 
and discuss their properties. We say that X is a periodic random variable if the 
only values of X that may be achieved with positive probability are contained 
in the set 


X=0+re, r= 0, +1, +2,..., 


where œw and c are integers and |c| 4 1. We may note that “the Markov chain 
{Sa} is periodic” implies that the random variables X, are periodic random 
variables but not conversely. (The student should prove this.) Recall that 
“{S,} is aperiodic” means that the smallest additive group generated by the 
integers i for which Pr{X, = i} > 0 is the group of all integers. 
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The example 


x = +1 with probability p, 
k į—1 with probability q 


is a periodic random variable. In fact, we can represent its possible values in the 
form 


X =1+42r, ee Oa tl a 
(Here c = 2, œw = 1.) 
Lemma 2.1. The X,, are periodic random variables if and only if their character- 
istic function (0) has the property 


Ip(@o)| = 1 (2.5) 
for some 09 # 0, =n < bo < T. 


Proof. Suppose that for 09 = h #0(—a <h< n) 
|p(h)| = 1. 
Then there is a real number w such that (h) = e", and hence 


l= e~ "»o(h) = X Perm 


j=- œ 


j=- œo j=- œ 
The real parts of both sides must be equal; thus 
1 — X Poj cos(j FT w)h. 
j=- œ 


Since |cos x| < 1 for all x, we infer that for all j accessible from zero, i.e., for 
those j for which Po; > 0, necessarily 


cos(j — w)h = 1. 
This requires the existence of an integer r for which 
(j — wh = 27r, for some r= 0, +1, +2,..., 


i.e., any j accessible from zero may be expressed in the form j = w + (27/h)r, 
r=0, +1, +2,..., and plainly |c| = |2z/h| # 1. Clearly, X, may take only 
these j values, i.e., X, is a periodic random variable. 

Conversely, if the possible values of X, are contained in the set 


X, =o + re, r=, +1, £2... 
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(œ and c are integers, |c| # 1), then 


© 
o(0) = X Pier 
r=-—a@ 


and 


X Peeve L 


r=- o0 


Now let 6) = 2zn/c. Since c is an integer, |c| # 1, it follows that 0o # 0, — z < 0o 
< m, and 


2n S ico(2n/e) i2 jo(2uie) S ico(2n/e) 
| LO! nic IERP __ LOC LTC pese LD! 
p pa <= X Po, o+rc€ e =e X Po, w+re =e © 


Thus (2.5) is satisfied with 6. = (2n/c) #0, -n < 0o <7. E 


l 
We assume henceforth, unless it is stated explicitly to the contrary, that X, 


is a nonperiodic random variable. 


Lemma 2.2. There exists a constant À > 0, such that 


1 — Re (0) > 40, —-n<O<n. (2.6) 


Proof. Note that 


J 


1—Reg(6)=1— S Po cosj@= Y (1 — cos jO)Po,. 


j=-@ 
We employ the identity 
1 — cos a = 2 sin? 5 for any real a. 
Thus 
e f. J < {248 
1—Reg(0)=2 }, |si? = )Po >22 È} (sin? ]Po, (2.7) 

j=2% 2 jÆ 2 

for any positive L. 
We next use the familiar inequality 


2|x| 


|sin x| > n for —=<x< (2.8) 


= 
7 
One proof of (2.8) proceeds as follows: We claim that the function (sin x)/x is 


decreasing over the range 0 < x < 7/2. In fact, 
d (=) | x cos x — sin x 


NIA 


dx\ x x? 
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But 1 < sec? x and integrating both sides from 0 to x gives x < tan x. Since 
cos x > 0 for 0 < x < x/2 we infer that (d/dx)[(sin x)/x] < 0 and so (sin x)/x 
is decreasing for 0 < x < n/2. Therefore 


sin x as sin(z/2) 2 
x nf 


; 2 T 
or sinx >—x, O<x<-. 
T 2 


Since both sides of the last inequality are odd functions of x, (2.8) follows. 
Using (2.8) in (2.7) gives 


j0\? Dy ee, m 
1—Reg@)=2¥ |= Poj = 39 £ Pes (2.9) 


lil<L \T \iSL 


valid for all @ such that |j@| < x. But if |j| < L, the condition |j6| < z will 
be satisfied whenever 


T 
DE (2.10) 


By choosing L large enough there must be at least one j such that |j| < L 
and Po; > 0. Thus by specifying L large enough such that 


C=2r7? yp Po 0, 
II <L 


we have 


1 — Re ọ(0) > C0? (2.11) 


for all |0| < z/L with C > 0. 

Thus far we have made no use of the aperiodicity of the random variables 
{Xp}. We need this assumption, however, to estimate 1 — Re ø(0) for |0| > 2/L. 
We know that X, being nonperiodic random variables is equivalent to the 
condition that |@(6)| = 1 is satisfied on 0 e [— r, z] only for 0 = 0 (Lemma 
2.1). But |g@(@)| < 1 is always true for any characteristic function. Therefore, 


1 — Re (0) > 1 —|g(O)| > 0 (2.12) 
for 0 #0, —x < 0 < v. Since 1 — Re (9) is a continuous function of 0 in 
L-z, T), 

m= min {1 — Re ọ(0)} 


n>|0|>n/L 


exists and is positive in view of (2.12). This can certainly be expressed in the 
form 


0? 
| — Reg) =m = (2.13) 
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valid for all 6 satisfying x > |0| > z/L. Now let 
A = min(C, m/n?) 
Then (2.6) is satisfied for all |0| < n. E 


Now we are prepared to state the theorem presenting a bound on the rate of 
convergence of Po; to zero. 


Theorem 2.1. If the r.v.’s {X,} are aperiodic (i.e., nonperiodic), then for some 
constant A > 0 (independent of j and n) 


A 
Po; <= (2.14) 
Ja 
for all integers j and n > 1. | 
Proof. It is clear from (2.4) that 
PR < (27) al |p(0)|?7" dé. (2.15) 


Note that |o(0)|? is the characteristic function of an integer-valued random 
variable. Indeed, since 
(0) = E[exp(iX,0)] for any k=1,2,... 


and 
(0) = E[exp(—iX,0)] for any l= 1,2,..., 


we have for k # l, dj ‘ 
lp(9)|? = P(O) = ElexpGX,,6)]ELexp(—iX,0)] =E[exp(i(X, — X)0)], 


so that |@(@)|? is the characteristic function of the integer-valued random 
variable X, — X, where X, and X, are independent and identically distributed. 
The property that X, is not a periodic random variable is equivalent to the 
nonperiodic character of the random variable X, — X,. This is a direct con- 
sequence of Lemma 2.1. Let (6) = |g(@)|?. We now make use of Lemma 2.2 
in the case of the real characteristic function (9). The lemma provides us with 
the inequality 


1 — W6) > 262, 


valid for all 0 such that —z < 0 < x for some A > 0. We rewrite this relation 
in the form 


(0) < 1 — 20? < exp(—A6?), (2.16) 


where the last inequality results from the relation 1 — y < e™”, y = 0, which 
follows by integration of the trivial inequality e”* < 1 over [0, y]. 
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From (2.16), by integration we have - 


f Y0) do < f apt nindi = z A api idii 


< (yny! Í i exp(—Aa?) da. (2.17) 


Combining (2.15) and (2.17) yields 


2n < 
Po < 


exp(— Aa?) da = di 


< E T Tia’ (2.18) 


where 


= (2n)7} ibs exp(— Aa?) da. 


Since | (6)| < 1, we also obtain 
P34! < (Qn)! Í 100)! dð < Qn)? F | o(6)|2" d8 


<s pi (2.19) 


Put A = ./2A,; then (2.18) and (2.19) assert (2.14), as was to be shown. W 


= {Woy 


It is important to emphasize that the bound exhibited in (2.14) applies 
whether the Markcv chain {S,} is recurrent or transient. It is instructive at 
this point to consider the example of the classical stochastic model of coin 
tossing. Specifically, 


x= 1 with probability p, 
k -1 with probability q. 


Then the sequence S, describes a Markov chain with special transition 
probabilities 
p if j=i+1, 


0 otherwise. 


Moreover, it is clear that 


3. RIGHT REGULAR SEQUENCES 83 


whose asymptotic behavior [cf. (6.2) of Chapter 2] is 


Tn (4pq)”. 
nt 
This shows that if p # 4 then P23 tends exponentially fast (at a geometric 
rate) to zero. In the case p = q = 4, we see that the bound in (2.14) is exact. 
This example is typical of the general situation. Actually a considerable 
sharpening of (2.14) is available under the assumption that E[X7] < œ and 
E[X,] = u = 0. In this case, Theorem 1.1 tells us that the Markov chain {S,} 
is recurrent and indeed null recurrent. Therefore )°*_ 9 P§o = œ and Po; > 0. 
An application of the central limit theorem proves that 


lim \/nP?; = B, (2.21) 
l 


n> 


P23 ~ (2.20) 


where B is a positive finite constant independent of j. The proof of this result 
is beyond the level of the book. We refer the reader to the works cited in the 
footnotest{ for details. When E[X?] = œ and E[|X,|1*°] < œ for some 
0<6<1 but E(|X,|'*5)= œ for > 6, then frequently the precise 
asymptotic relation replacing (2.21) becomes 


lim nV/C*9PS, = B. (2.22) 

n> oo 
This is relevant when the central limit theorem is not applicable but rather 
attraction to a proper stable law occurs. The theory of stable laws is an elaborate 
area of probability theory of importance in applications particularly to physics 
and astronomy. We cannot enter into this vast subject in this elementary book. 
We must be content merely to state its existence and encourage the reader to 
pursue these topics in later courses on probability theory. 


3: Right Regular Sequences for the Markov Chain {S,} 


For special classes of Markov chains finer results pertaining to the theory of 
recurrence, occupation time problems, the evaluation of distributions of various 
functionals of the process, and other results are usually available. In this section 
we will present a sharpening of the characterization for right regular sequences 
(Theorem 5.2 of Chapter 11) in the special context of the Markov chain {S,}. 
If a Markov chain is irreducible and recurrent then the only right regular 
vector is the constant vector (Theorem 5.2, Chapter 11). In the special case of 
sums of independent random variables, this theorem can be extended to, the 


tB. V. Gnedenko and A. N. Kolmogorov, “Limit Distributions for Sums of Independent 
Random Variables" Addison-Wesley, Reading, Massachusetts, 1954. 
{ F. Spitzer, “ Principles of Random Walk,” D. Van Nostrand, Princeton, New Jersey, 1964. 
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aperiodic transient case if we require the right regular vector to be bounded. 
More precisely, we prove the following. 


Theorem 3.1. If the Markov chain {S,} is irreducible with transition probability 
matrix P and if {y;} is right regular, i.e., y satisfies 

yj, 20 forall j =0, +1, +2,... (3.1) 
and 


X Pyyj=y: forall i=0, +1, +2,... (3.2) 


j=- 
and {y;} is bounded, then y; = constant for all j. 
Proof. Assume that {y;} satisfies (3.1) and (3.2) and is bounded. Let kọ be 


any state other than 0 that can be reached from state 0, i.e., there exists n such 
that P,, > 0. We keep ko fixed and define 


Zi = Yj — Yj-ko J= 0 EEZ a 
Now, using spatial homogeneity, we have 


foe) foe) foe) 
Y P; jYj-ko = : > Pi ko+kYk = È Peeve = Vi-ko> 
=- 0 k=- œ 


j=% 


which says that {uj} = {Yj-ro}j=-œ also satisfies (3.2) and therefore {z;} 
satisfies (3.2). Trivially, {z;} is bounded since {y,} is bounded. Let 


M = sup zj < œ and sup|z;| = M’. (3.3) 
j j 
We select a sequence of integers {r„} for which 
lim z,, = M. (3.4) 


Since {z;} is bounded we may select a subsequence {r{!)} from {r„} such that 


lim 2,4, 
n> o 
exists. Then we can choose a further subsequence {r{~ 1} of {rP} such that 
lim 214-0) 
n> o 
exists. Then select another subsequence {r?)} of {rf 1} such that 
lim 2a4 ro 
n> 


exists. Continuing in this fashion we determine sequences {r2} rO) 
{r-},..., each a subsequence of the one preceding it. Now there is a sequence, 
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namely {s, = rų ™} in this case, which from some point on is a subsequence 
of each of the sequences 


GO, EO, AO GED, 


In fact, {s,} is a subsequence of {r®} at least from n > |p| on. Because of the 
construction we know that 


lim 245, = 27 (3.5) 
exists for all integers j. (This procedure is called the method of diagonalization.) 
Clearly, by the definition of {r,} [see (3.4)] we have 


zš = limz,,=M (3.6) 


n> oo 


and by (3.3) 


z* = lim z;4,, < M, j=O0, +1, +2,.... 


n> oo 


But we observed before that 


oO 


5 PitsnjZj T Zits? 
jae 


and taking limits on both sides as n > œo we have 


foo} foo} 
lim Zitsn = lim X Pits, jZ =i lim X Py j-snZj 
n> o n>% j=- 0 n>% j=- 

co (3.7) 

= lim $ Pass, 
n>o l=- %0 

We now claim that interchange of limit and summation is permissible. To 
validate this assertion, we are required to prove that for any prescribed € > 0 
there exists an integer n(e) such that 


foe} 
X PilZi+sa — ZÝ) | < & 
l=- œ 
provided n > n(e). To this end, we determine L sufficiently large so that 


E 
X Pa < IM 


HI>L 


Next we choose no sufficiently large so that 


[pie er es ; forall n> nọ and [satisfying -L s/s L. 
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Now combining these estimates we have for n > no(é) 


fee} 
5 PilZi+s, — zi‘) 
l=- o 


E 


3e 


<2M' > Pi+ > Pulziss, — atl S 5 + 


>L HI<L 


In passing to the limit under the summation sign in (3.7) we obtain 
z = X Pazř, (3.8) 
I=- œ 


i.e., z¥ satisfies (3.2), although not necessarily (3.1). From (3.8) by iteration 
we obtain 


oO 
z= J, Pizt, forany n>0. 
j=- œ 


Thus, with i = 0 and with reference to (3.6), we conclude that 
X Pl;zž =z% =M, forany n20. (3.9) 

j=-% 
The left-hand side in (3.9) is a weighted average of numbers which are all < M. 


This is only possible if for all j for which Po; > 0 for some n > 0 we have 
z* = M. In particular, by the definition of kọ, 


x — 2 a x ë — 
zt, = M, Zško = M, zidy Zito = M, 


for any positive integer t. Then by (3.5) n may be chosen large enough such that 
all the inequalities 


Zko+sn > M a E, 
Z2ko+s, > M — €, 


Ziko+sn >M-—e 
are satisfied simultaneously. Adding these inequalities yields 


t(M — 8€) < Zko+sn F Z2ko+sn + See Zikotsn 
= (Ykotsn — Va) + (V2kotsn — Ykotsn) + W3kotsn — V2ko+sn) 
+e + ikotsn — Ya- ee) 
= Yiko+sn ` Vsn- 


Since the y; are bounded there exists K > 0, such that y; < K, for all integers j. 
Then plainly 


((M = 68) < 2K, 
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Since this must hold for any positive integer t and e > 0, M necessarily must be 
negative or zero. Thus y; — Yj- = Zj < 0 or 


Yj S Yj-ko (3.10) 


for all integer j and any kọ that is accessible from zero. 

Examination of the preceding analysis shows that the hypothesis (3.1), 
i.e., y; = 0, was never used. Actually only the fact that the | y;| are bounded was 
vital. Therefore we could follow the same procedure putting y; = —y; in 
place of y; throughout. Of course {y;} remains bounded and satisfies (3.2). The 
arguments above now yield the conclusion 


Vi È Yj-ko forallj and for all states kọ accessible from 0. (3.11) 
Comparing (3.10) and (3.11) we have 
Yj = Yj-ko forallj and for all states kọ accessible from 0. 
In particular 
Yo = Y-ko for all ky accessible from 0. (3.12) 


Now for the first time in the proof we will make use of the irreducibility as- 
sumption. This assumption guarantees that all states are indeed accessible 
from state 0. Therefore, (3.12) holds for all kọ and each component y; is equal 
to the constant yọ. Hl 


As an application of Theorem 3.1, we complete this section by proving the 
generalized renewal theorem for sums of independent identically distributed 
integer-valued random variables. Throughout this discussion we shall assume 
that the sums {S,} are aperiodic, i.e., that the smallest additive group generated 
by the integers i for which Pr{X, = i} > 0 is the group of all integers. 

The proof presented below is not the most direct, but several of the auxiliary 
results are of independent interest and the techniques are common to other 
studies of fluctuation theory for sums of independent r.v.’s. 


Theorem 3.2. If S,= X,+X2+-:-+ X,, n= 1, are aperiodic and if 
[|| X|] < œ, ELX] > 0, then 


1 


lim G; = lim ¥ P} i; ==, 3.13 
Bo ges » on EEX] (3.13) 
lim G; = 0. (3.14) 


j>-œ 


It is convenient to divide the proof into several steps. For any quantity a let 


a! = max(a,0) and a~ = min(a, 0). Let 


M, = min(S,, S3, rey S,). 
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Since obviously M,, is nonincreasing, 


lim M, = inf(S,,S,,...)=M 


n>o 


exists and conceivably M could be — œ. However, since EX, > 0, the strong 
law of large numbers assures us that 


Pr{M = — œ} < Pr{S, < 0 infinitely often} = 0. 


It follows that M is finite with probability 1. 
Now 


E(M,,] = Efmin(S,,..., S,)] 
= E[X, + min(0, X2, X2 + X3,..., X2 + + X,)] 
= E[X,]+ E[min(0, X2, X2 + X35... X2 + + X,)]. (3.15) 
Since X, are independent and identically distributed, we recognize the last term 
as E[M;-1]- 
Letting n go to œ we obtain 


E[M] = E[X,] + E[M7]. (3.16) 


(The student should justify the interchange of the limit operation and expecta- 
tion.) Obviously 


E[M] = E[M*] + E[MT] 
and comparison with (3.16) reveals that 
E([M*] = E[X,]. (3.17) 
The proof given above is not fully rigorous since we do not know a priori 
that E[M~] > — œ and only in that case is (3.16) meaningful. 


In order to provide a complete proof we will need the following theorem, of 
some interest in itself. 


Theorem 3.3. Let Z be a nonnegative integer-valued r.v., i.e. Pr{Z = n} = py, 

n=0,1,..., Xo Pa = 1 with characteristic function ¢(0). Suppose [p(0) — 1]/ 

ið = (1/i0) $ œo p,(e'”’ — 1) converges to «(0 < % < 00) as@ | 0. Then E[Z] = 
n=0 NPr = 4. 


Proof. By taking the real and imaginary part of (ọ(0) — 1)/i0 we conclude 
that 
œ i= 0 æ $ 
lim Y p, ann 20) ado Sim Spee. ae) 


0,0 n=0 010 n=0 0) 


Now for any fixed 0 < 0 < 7/2 we determine the largest integer k = k(0) 
satisfying 2/2 = k0 > 0. 
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Consider the decomposition 


2 sinnOd È sinnô sin n0 
È pn = Èp, + È h l (3.19) 
n=0 0 n=0 0 n=k+1 0 


In view of (3.18) the sum pe o P,(sin n0)/0 is uniformly bounded for 6 sufficiently 
small. 

We estimate (sin n6)/0 in È f-o p,(sin n0)/0 from below using the fact that 
(sin 0)/0 is decreasing on 0 < 0 < 7/2 (see page 79), which yields, in particular, 


sin @ | sinn/2 _ 2 
0 T nap 


Tt 
fi =. 
or UR 


From the definition of k, it follows that 0 < nO < 2/2 for 0 < n < k. Applying 
the above inequality yields 


2 sin n0 j se 
Pp =5 p 
D 1 . 0 0 2 1 


But 1 — (sin n0)/n0 > 1 — (1/n0) >1—2/n=b>0 for all n satisfying 
nð > 2/2. From the definition of k we know that n > k implies n@ > 7/2. 


Therefore 
i 2 1 <2 sin n0 
= a nfl — 
0 2) P b0 2 ( nð ) 
ER. 3 i fa cos né) dé 
HO ga ON 


91 — cos né 
—: d 
pa | z ¢ 


So 

S 

= 
Ms 


il 
= 
+ 
a 


0 oO if 
=o Í » n(n) dé (the interchange 
bo 0O n=k+1 ¢ 


is justified since all the terms are nonnegative) 
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However, Im([g(é) — 1]/ié) > 0 as č | 0 in accordance with (3.18) and so its 
average tends to zero, i.e., 


e -1 
a Í im| 20) dé =0. 
010 0 Jo ig 
Putting together the preceding estimates we see that the second sum of 
(3.19) tends to zero as 0 | 0. It follows that 
k 
Ynp, (k= kO) 
n=0 
is uniformly bounded for 0 > 0. Obviously k = k(0) increases to infinity as 


0 | O and this implies the convergence of )}°_, npy = E[Z]. To finish the proof 
of Theorem 3.3 we must establish that 


p(6) — 1 


n=0 6-0 l 


(3.20) 


To this end, we prescribe ¢ > 0, and determine N(e) so that 1° y41 npn < £/2, 
which is certainly possible since ® o np, converges by what was already 
demonstrated. 

Now, consider 


~(0) —1 © _ N ind ae oe) ein? al | © 
id È npn az Z Pn i0 nj + > Pn id X npr. 


n=0 


Since |(e"° — 1)/i0| < n holds, the second and third sum are each bounded by 


e/2. Hence 
N ind f 
n= l i0 z | 
For fixed N the right-hand sum tends to zero since each term does, and therefore 


pA) — 1 
id 


< lim 
040 


lim 
640 


— E[Z] 


—1 
a p + £. 


lim 
610 


— E[Z]| < &. 


The left-hand side is a fixed nonnegative number and ¢ > 0 can be chosen 
arbitrarily small. The result (3.20) clearly follows. E 


We are now prepared to give a rigorous proof of the identity (3.17). We 
introduce the c.f.’s of M,, X4, and M,_ 4, Le., 


om,(0)= > ™ Pr{M, =k}, 


k= -w 


px (0) = y e’ Pr{X, = k} 
k= r 


ia) 
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and 


o 
Gu = } e™ Pr{M;-; = k} 
k 


=— 0 


Note that the possible values of M,_, are restricted to the set of nonpositive 
integers. Manifestly, X; and 


MS MnO X3, Xa + X3,..., X2 + X3 es) 


are independent r.v.’s. Moreover M,_, and M,_, are identically distributed 
since M,- , is defined in terms of X,, X3,..., X,, in precisely the same way that 
M,,_, is formed from X,,..., Xp-1- 

Since M, = X, + M,_, [this identity is implicit in Eq. (3.15)], we deduce 
the relation 


PMO) = Px (OP i0) = Px Opu; (0). 
From the definitions.we plainly have M, > M and M;-; > M7 as n> œ 
(convergence is meant in the sense of distributions) and so, by P. Levy’s con- 
vergence theorem (page 11 of A First Course), it follows that 
Pul) = px, 0)pm-(0). (3.21) 
But 


ou) = Se Pr{M =k} 
k 


00 =f 
= $ e™ Pr{Mt=k}+ }, e™ Pr{M = k} + Pr{M = 0}. 
k=1 


k=- o 
Since 
Pr{M = 0} = Pr{M~ > 0} + Pr{M* <0}-1 
= Pr{M~ = 0} + Pr{M* = 0} - 1, 
we can write the above expression in the form 


roe) k=0 
pul) = Ye Pr{Mt =k} + ¥ el Pr{M =k} - 1 
k=0 — 0 


eu +(9) + py-(8) — 1. 
Inserting this formula into (3.21) yields 


Pm :(9) — 1 = gy-(O)(x,(8) — 1). (3.22) 
Now dividing by ið and then letting 0 — 0, we obtain 
in es E[X,] since E[|X,|] < œ. 


0-0 id 
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(A formal proof of this limit relation can be carried out as in the final arguments 
of Theorem 3.3.) Trivially limp_.9 @4-(@) = 1 and comparison with (3.22) shows 
that 


0 < lim 
90 


te =" = E[X,] < ©. (3.23) 


Because M* is a nonnegative r.v. we can appeal to Theorem 3.3, which tells 
us that the limit in (3.23) is E[M*] and thus E[M*] = E[X, ] as was claimed. 
We also need the following lemma. 


Lemma 3.1. Let e; (i < 0) be the probability that the process {S,} starting at i 
enters on the first step the positive states and thereafter never visits a nonpositive 
state. Define e; = 0 for i > 0. Then Y- e; = E[X,]. 

Proof. Clearly, from its very definition 


Pr{inf(S,,5,,...) > —i} = Pr{M > —i}, i<0, 
ej = h 
0, i>0. 


È sa 5 Pr{M > —i} = Ý Pr{M > j} 


II 
Mes 


œ (oe) k-1 
$ Pr{M =k} = YPr{M =k} 1 
j+1 k=1 j=0 


j=0 k= 


II 
ime 


k Pr{M = k} = yk Pr{M* = k} = E[M*] = ELX,]. 
k=0 


The proof of Lemma 3.1 is complete. Hi 


Let 
V(i) = Pr{S, < 0 for some n > 1|Sp = i}, 


i.e., V(i) is the probability, starting at i, of visiting the nonpositive axis. 

Consider the realizations of the process which start at i and in fact visit the 
nonpositive states. With probability 1, by the law of large numbers, since 
E[X,] = u > 0 by assumption such a path visits these states at most a finite 
number of times and thus there is a last time that it does. The probability that 
the last visit occurs at the nth trial is $ R= - o Pik ex = X} 2 o Phe; the last identity 
follows from the definition of e, for k > 0. Enumerating the contingencies of 
the last visit to the nonpositive states we get the important identity 


vo=% È ra= S a(S re 
=- n=l 


k =- 0 


a > pip m ön) = j ` Gin eh a Oy (3.24) 
n=O -o 


ina) 
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(ôix is the Kronecker delta function). We are now in a position to establish the 
renewal theorem. 


Proof of Theorem 3.2. From the definition of G;; we find 
D PGi = D Pi 2 Pel => D = Gi; T, Oi; (3.25) 


Since G,; < Goo (Lemma 1.1) we conclude with the help of the diagonal pro- 
cedure (see the proof of Theorem 3.1) that we may extract a subsequence 
jm > œ such that 
lim Gijn = Qi 
exists for all i. 
Letting k = jm tend to +00 we obtain from (3.25) 


`: Pi Pr = Qi (3.26) 
k=- œ 


and {ọ;} is bounded. [The reader should justify the interchange of limit and 
sum in passing from (3.25) to (3.26).] Appealing to Theorem 3.1, we know that 
the regular bounded sequence {@;} is identically constant; call its value « From 
(3.24), we have 


V(~Jm) =} G jigs — E- jm 
k 
Since G- jn k = G-k, ja and Yp e, converges we conclude that 


lim V(—jn) = } & = aELX,] (by Lemma 3.1). 
m~ co k=- œ 

But obviously lim,,..,. V(—jm) = 1. Therefore « = 1/ELX 1]. If we had another 

sequence j,, > œ with the property that G,;, converges for all i then the same 

argument would show that its limit is(E[_X ,])~ +. It follows that (3.13) holds. E 


The result of (3.14) is proved in a similar manner and will be omitted. 
4: The Discrete Renewal Theorem 


The discrete random renewal theorem was stated in Section 1 of Chapter 3 
and its proof, in a special case, was given in Section 2. Here we give an alternative 
proof based on Theorem 1.1 and the idea of coupling random processes. 
Let X,, X2,... be independent identically distributed nonnegative random 
variables with p, = Pr{X, = k} for k = 0, 1, .... We assume that pọ = 0, 
Oep <l, and p= VR.9 km <œ. Let So =0 and S,= X,+---+X, 
forn 2 | be the times of the successive “renewals” determined by the “lifetimes” 
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X,,X>,....Letv, = Pr{S, = nfor some! > 0} be the probability that a renewal 
takes place at time n. 

The renewal theorem that we will prove in this section asserts that v, > 1/u 
as n> oo. 

We introduce the so-called “stationary renewal process” associated with 
{Xn} Let Yo, Y,,... be nonnegative independent random variables, representing 
component lifetimes, and let T, = Yo + --- + Y, be the corresponding renewal 
instants. While Y,, Y,,... are assumed to follow the probability distribution 
{Pk}, the initial lifetime Yo is given the distribution 


Pr{Yo = k} = (Dati + Pr+2 +-°/u for k=0,1,.... 


Let u, = Pr{T, = n for some | > 0} be the probability that a renewal takes 
place at time n in the stationary renewal process determined by {Y,}. 

In the stationary renewal process, renewals occur at a constant rate 
(probability) exactly equal to u, = 1/u. In the ordinary renewal process we wish 
to show that renewals occur asymptotically at rate lim v, = 1/u. To establish 
this result we couple the two processes together. 

Let Uo = Xo — Y = — Y and U, = U,_, + (X, — Y,) measure the dis- 
crepancy between the two processes. Let N = min(n > 0:U, = 0} be the 
first instant that the same renewal simultaneously takes place in both processes. 
Then Pr{N < œ} = 1 because E[X, — Y,] =0 and by Theorem 1.1, the 
process {U,, — Up} is recurrent, whence 


+ co 
Pr{N < œ} = } Pr{U, = —land U, — Up visits l} 
1=0 


+o 
¥ Pr{Uo = —l}=1. 
1=0 


Since N is finite we may couple the two renewal processes by assuming 
X, = Y, for n > N. The probabilistic properties of the individual renewal 
processes {X,„} and {Y,} are not affected by this coupling. Then 

v, = Pr{S, = n for some | > 0} 
= Pr{S, = n for some! > N} + Pr{S, = n for some | < N}. 


Because of the coupling we have S, = T, for l > N, whence 


v, = Pr{T, = n for some | > N} + Pr{S, = n for some! < N} 
= Pr{T, = n for some! > 0} + Pr{S, = n for some ! < N} 
— Pr{T, = n for some! < N}. 


Now Pr{T; = n for some I > 0} = 1/u for all n. Thus to complete the demon- 
stration that v, > 1/u as n > œ, we need only show 


lim Pr{S, = n for some! < N} = 0 


no of 
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and 
lim Pr{T, = n for some 1 < N} =0. 
But because S, < Sy for! < N, we have 
lim Pr{S, = n for some! < N} < lim Pr{Sy > n} = 0, 
and similarly, 
lim Pr{T, = n for some! < N} < lim Pr{Ty > n} = 0. 


n>o n> 


This completes the proof that v, > 1/u as n > oo. 


Remark. The assumption that 0 < p, < 1 implies that the state space for the 
process {U,,} comprises all integers and that this chain is aperiodic. The weaker 
assumption that the greatest common divisor of the set {k:p, > 0} is 1 will 
suffice. 


Elementary Problems 


1. Let X,, X,,... be a sequence of independent, identically distributed, integer-valued 
random variables. Define the partial sums Sọ = 0 and S, = Xi + Xa +- + X, for 
k > 1. The subscript n > 1 is called a ladder index if S, > S; for j = 0, 1,...,n — 1. Call 
the event that n is a ladder index &. Define Yo = 0 and Y; as the time (i.e., the index) of the 
last occurrence of & where the present trial is the Nth. Let W denote the time of first occur- 
rence of £. Suppose & occurs at trial n. Prove that the number of trials until the next oc- 
currence of & is independent of n and distributed as W. 


2. Under the hypothesis of Elementary Problem 1 prove the identity 
2 1— F(t 1 
2 =P ae 
where F(t) = X, Pr{W = ny”. 
Ilint: Use the relation 
Pr{ Y, = k} = Pr{ Y, = k} Pr{Y,_, = 0}. 
3. Under the hypothesis of Elementary Problem 1 prove the exponential representation 


oO 


u(y) = e| X = ŒK] - E-D}, 


where U(t) = 1/1 — F(t)). 
Hint: With the aid of Elementary Problem 2 derive the differential equation : 


u(i) w : 
= Y ECY, — Y-a h"! 
u) p [ n n il 


und solve it. 
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Problems 


1. Consider a sample space consisting of the n cyclic permutations of (a,, a2, ..., An) 
with each permutation having probability 1/n. For a given point x = (ay, akt 1>- -> ak+n-1) 
where by definition a,,, = ap L= 1, ..., n — 1, let N(x) be the number of partial sums 
among {d,, dy + Ak+1s Ak + Ape, + een, +++, Ak +++ + A4,} which are zero. Let M(x) 
be the number of distinct partial sums. Show that if )"_, a; = 0, then E[1/N] = M(x)/n 
for any x. 


2. Let X,, X2, ..., Xn, ... be independent random variables uniformly distributed 
on (0, 1]. Show that forO <a<n-1 


PX +--+ Xs} Y v(') : 2. 
j=0 : 


[a] denotes, as usual, the greatest integer smaller than or equal to a. 
Hint: Use induction with respect to n. 


3. (Refer to Problem 2.) Establish the identity 


1= Soh) non 
=0 J 


j= n! 


4. In Problem 2 let r be the index such that X, + ---+ X,-; Sabut X, +--+ X, >a. 
Show that 


Hint: Verify the identity E(r) = ) > Pr{r > n} = } -o Pr{X, +--+ X, < a} and 
then use the result of Problem 2. 


5. Let X,, X2, ... be independent, identically distributed, integer-valued random 
variables. Define 


So =0 and S,=X,+X,+---+ Xk for k=; 2er 


Let f$} be the probability that the process {S,} first returns to the origin at the nth step. 
Let y, be the probability that at the nth step the process will occupy a new state, ie., y, = 
Pri{S, # So, Sn Æ Sio Sn Æ Sz,...,8, Æ Sn-1} Prove that 


Ya = Pr{S; # 0, S3 # 0,..., S, #0}. 
In the case E[|X;|] < œ% and E[X;] = 0 express y, in terms of {fR}. 


6. Define S, as in Problem 5. Let R, be the number of distinct states visited in the first n 
steps by the process {S,}~ 9. Compute the expected value of R, in terms of {y,} defined 
in the preceding problem. 


Solution: EÇR,] = $f- yi 
7. Under the conditions of Problem 5 prove that 


EER 
lim [Ra] 


Woh 


=| Ibo. 
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where 
foo} 
fbo = X fbo- 
k=1 


8. We retain the notation of Problem 5. Let L, =j if and only if $;>S;,0<i<j, and 
S; > S;forj < i < n, i.e., L, is the first index of S; where maxo <;<, S; is achieved. Prove the 
identity 


Pr{L, = j} = Pr{L; = j} Pr{Ly-; = 0}. 
9. Define X, with X, a real number and t = 0, 1, 2,... as follows: Xo = 0, 


tou X,-1 + @Q with probability 4, 
17 IX,- -—@) with probability 4. 


Show that in the limit as t > œ, the distribution of X, tends to the uniform distribution 
on (—1, 1). 


Hint: The distribution of X, tends to the distribution of Y = $p, ¥,(3)* where the Y% 
are identically and indepeņdently distributed with values +1 equally likely. Let f(s) = 
[Ik cos(s/2*), which satisfies the functional equation 


S Qs) = (cos s) f (s). 


Show that the only continuous solution of the latter equation satisfying f (0) = 1 is (sin s)/s, 
which is the characteristic function of the uniform distribution on (—1, 1). 


10. Consider a random walk on the integer lattice of the positive quadrant in two 
dimensions. If at any step the process is at (m, n), it moves at the next step to (m + 1, n) 
or (m,n + 1) with probability 4 each. Let the process start at (0, 0). Let I be any curve con- 
necting neighboring lattice points (extending from the Y axis to the X axis) in the first 
quadrant. Show that EY, = EY,, where Y, and Y, denote the number of steps to the right 
and up, respectively, before hitting the boundary IT. The diagram describes an example 
of T. 
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Hint: (a) First consider the case when the curve I consists of two segments AB and BC. 
AB is the horizontal line extending from coordinates (0, 1) to (N, 1) and BC the vertical 
segment from (N, 1) to (N, 0). In this case verify 

N-1 N-1 

ELY,J=3+ LOS ELK = $ kO + ON, 

k=1 k=1 

and that these quantities are equal. 
(b) Any region bounded by the X and Y axes and a curve I can be broken into blocks 

as in case (a) above. Use an appropriate form of the addition law of expectations and the 
result proved in case (a). 


Problems 11-16 are based on the following model 


Consider a random walk X,, = (A,, B,) in the plane where the possible states are all 
points with integer coordinates in the two-dimensional space. Assume that the probability 
of transition in one step from any state to any of the four neighboring states is 4. Let T 
be the time that the random walk starting at the origin first hits the 45° line. Let X r denote 
the point where the random walk hits the 45° line. Define 


Qo; = Pr{Xr = G./)|Xo = (0; 0} 


This is the transition probability of a one-dimensional random walk which is the same as 
the original random walk observed only at times when it hits the 45° line. Define the char- 
acteristic function of this random walk as 


y0) = 2 Qome™®, —-0<0< 0. 


11. Define Up = W = 0, U, = A, + B,, and V, = A, — B,,n = 1, 2,.... Prove that the 
sequence of random variables {U,,} is independent of the sequence {V,}. (The variables 
{U,,, Va} produce a change of coordinate system so that the 45° lines comprise one of the 
system of coordinates.) 


Hint: (a) Show first that 


4 if [rl[=1 (m=1,2,...) 
Pr{U,, — U,_, = tr|Up = Vo = 0} = 2” BELG 
r{ m m-1 +r| 0 0 } to if Ir] # 1, 


and 


Pr{V, — V, = t51U0=%=0)= f PA T 
n n-1 = E CEET SET N if |s|#0, n=1,2,.... 


(b) Show next that 


Pr{U, — Un—-1 = £1,V, — Va-1 = £1109 = Yo = 0} = 4 


and 
Pr{Un — Un-1 =U, h-i = 8|Uy = Yo = 0} = 0 


if either |r| # 1 or |s| 4 1 (n,m = 1, 2,...). 
Use (a) and (b) to prove that the sequence {U m — Um ,} is independent of the sequence 
{V, = K-i} and from this conclude that {U,,} is independent of {V,}. 
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12. Prove that the random variable T is independent of the sequence {U,}. 
13. Note that T = k implies V, = 0. Establish the formula 
WO) = $ Pr{T =k|X_ = (0,0)} }, e" Pr{U, = 21|/X_ = (0,0)}. 
k=1 I= —0 
14. Using the fact that U,, describes a one-dimensional random walk show that 
WO) = ¥ Pr{T = k| Xo = (0, 0)} (cos 6/2)*. 
k=1 
15. Show that the generating function of T is 
E[s] =1- 1-3. 
16. Prove the formula 
W(0) = 1 — |sin 6/2], -0<0< 0. 
NOTES 


The source of inspiration for this chapter is the elegant book by Spitzer [1]. Our discussion 
is a simplified version of a scant few topics from this important work. 


Excellent textbooks on advanced probability which contain much material on limit 


theorems for sums of independent random variables are those by Gnedenko and 
Kolmogorov [2], Loéve [3], and Renyi [4]. A modern comprehensive treatment of limit 
theorems in probability theory and their ramifications is Volume II of Feller [5]. Petrov’s 
book [6] is an excellent source of recent developments without assuming identical distri- 
butions for the summands. 


l. 


6. 
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Chapter 13 


In this chapter we present a variety of applications of some of the methods of Poisson 
processes and sums of independent random variables. 


1: Order Statistics and Their Relation to Poisson Processes 


Let Y,, Y,,..., Y, be n independent, identically distributed, random variables 
with continuous, strictly increasing, cumulative distribution function F(y). 
We define random variables Y#,..., Y* by 


Y* = the ith smallest among Y,, Y,,..., Y,.7 
In particular, 
Y# = min{Y,, Y3,..., Y} and Y* = max{Y,, Yp,..., Y,}. 


Clearly, we have 
Yý < Y$ <- < Y*. 


Y# is called the ith-order statistic of the sample (Y,,..., Y¥,) and (Y¥,..., Y*) 
is called the set of order statistics of size n associated with the sample (Yj,..., Y,). 

This chapter will be concerned with the distribution theory of the order 
Statistics of a sample, their relationships to Poisson processes, and other 
applications. First, however, we will make a significant simplification without 
loss of generality. 

Let us set 

X; = F(Y) (i = 1,...,n) 


f In case of ties, we make an arbitrary choice from among those Y, that qualify. Actually the 
event of at least two equal Y, values occurs with zero probability and for our purposes can be 
ignored. 
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and compute the distribution of X;: 


Pr{X; < x} = Pr{F(Y) < x} = Pr{Y¥, < F7 !(x)} 
= F[F~}(x)] =x for O<x<1, i=1,...,n, (1.1) 


where F~ 1, the inverse function of F, is uniquely defined by our assumptions 
about F. Further, since 0 < F(y) < 1, 


0 if x<0, 
Prix; <x) =f) if x>1 


Thus, by (1.1) and (1.2), X; is uniformly distributed over [0, 1] for every 
i = 1,...,n regardless of the form the continuous, strictly increasing function F 
takes. Notice that the order relationship among the {Y*} is preserved by the 
transformation X; = F(Y). This means that instead of investigating the order 
statistics { Y*} corresponding to the general sample (Y,,..., Y,) we may study the 
order statistics {X*} taken from the uniform distribution on [0, 1]. 

Therefore from now on we restrict ourselves to studying the order statistics 


for i=1,...,n. (1.2) 


Xt < XB <- < X¥ (1.3) 
based on a sample of independent uniformly distributed random variables 
XSA © pera, 


on the interval [0, 1]. 

The fact that (1.3) holds for the random variables X*, X¥,..., X* clearly 
indicates that they are not independent. We will first determine the joint 
distribution of the order statistics X¥,..., X*, or rather its probability density 
function, denoted by f*(x,,...,X,). That this exists will be clear from the proof. 
Prescribing 0 <x, <x, <---<x,< 1 and sufficiently small increments 
h,,hy,...,h, such that [x,,x, + hil, [x2, x2 + ho],...,[x,,x, + h,] comprise 
nonoverlapping intervals, we obtain 


Xn thy xy thy 
Í -f S (X45 0+ +5 Xn) AX, ++ dX, 
x xı 


n 


Pr{x; < X¥ < x; + h,i = 1,2,...,n} 
X Pr{x; < Xp < x, + hp i= 1, PEE n} 


all permutations 
o of (1,2,..., n) 


=} [|] Pr{x;, < Xp <x: +h} (by independence) 
o i=l 


= 5 [[h; = n!hiha hp, (1.4) 


where we have used the independence and uniform distribution of the X; 
(i = 1,...,n)and the fact that there are n! permutations of the indices 1,2,...,n. 
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Now if we let each h; shrink to zero it follows that the joint probability density 


function of the order statistics X¥,..., X* is 
S*(Xq5---5X,) = 1! for O<x,<x,<::-<x,<1, 
= 0 elsewhere. (1.5) 
The same proof shows that if the original X,,..., X, were taken from a 


uniform distribution over the interval [a, b], the corresponding joint probability 
density function of the order statistics would be 


n! 
a ee ee for a Sx SX, S- < X, <b, 


=0 elsewhere. (1.6) 


We encountered the density function (1.5) in our discussion of Poisson 
processes (see page 126 of A First Course). Specifically, let {Y(t), 0 <t < 1} 
be a Poisson process, in particular for every t e [0, 1], Y(t) is a discrete random 
variable with probability density function 
-a Att 

k! 


P(t) = e for k=0,1,2,..., 


=0 elsewhere, 


where 4 > 0 is a fixed parameter. Assume that Y(0) = 0. Now, under the 
condition that Y(1) = n (a positive integer) there will be exactly n time points 
in the interval [0, 1] at which Y(t) makes a “jump.” The exact location in [0, 1] 
of these jumps, depending on chance, define the random variables 


Ti Drees Ty (T, < T, <- < T,) 


with values in [0, 1]. 
We make the following assertion: Under the condition that 


Y(1)=n 


the random variables T,, T,,..., T, are distributed as a set of order statistics of 
size n taken from a uniform distribution over [0, 1]. 

The proof of this statement is quite immediate in view of the results already 
obtained. In fact, the evaluation of the conditional density function of 
T,, Ta, ..., T, under the condition Y(1) = n was given in Theorem 2.3 of Chapter 
4. Comparison with (1.5) shows that these formulas agree and our assertion is 
hereby established. 

The identification of order statistics and the conditioned occurrence of 
events of Poisson processes simplifies the derivation of other properties of order 
Statistics. 

For example, as earlier, let X*, X%,..., X* denote order statistics based on a 
sample of size n from the uniform distribution, We claim that the joint distri- 
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bution of X*, X¥,..., X#_, under the condition that Xj = c,,..., X = Cn 
will be that of k — 1 order statistics associated with the sample X,,..., Xk-1s 
where each r.v. follows a uniform law on [0, cx]. To verify this fact, we pass to 
the formulation of the problem in terms of events of a Poisson process. 

Let O< T, < T, <---< T, <1 denote the times at which the events 
of a Poisson process Y(t) occur under the condition Y(1) = n. Suppose we 
impose the further conditions T, = Ck, Tha; = Ck+1s +--> Tn = Cn and seek to 
determine the joint distribution of T,, T,,..., T,_,. Because Y(t) is a process 
with independent increments it is clear that the only information pertinent 
to T,, Th,..., Tp-1 can be summarized in the assertion T, = c, or in the equiv- 
alent statement Y(c, — 6) = k — 1 for e positive and sufficiently small. Under 
this last condition we know that T,, ..., T,-, are distributed as the order 
statistics of ak — 1 sample from a uniform distribution on [0, cą]. Thus the joint 
conditional distribution of X*, X¥, ..., X#., under the condition X# = 
Cy, ---, X* = c, is the joint distribution of a size k — 1 order statistic taken 
from a uniform distribution over [0, c,]. Thus the conditional density is given by 


Sf (X 15-2 Xena Ces -o Cn) = — r for Osx, S<- < Xk-1 Š Cks 


= elsewhere. (1.7) 


By exactly the same reasoning we may deduce the fact that the conditional 
density function of Xj,,,..., X*, given that the values of the first k order 
statistics are 

XE = Ea Ap = Cs 


is equal to the joint density function of n — k order statistics based on a sample 
of n — k independent observations each following a uniform distribution over 


Lcr, 1]: 


f” Sagas ees Xy lcis <- Ch) 
(n — k)! 
-a for Ck S Xk Sc SX <1, 
=0 elsewhere. (1.8) 
Formulas (1.7) and (1.8) exhibit the feature that these joint conditional density 
functions are dependent on c, only and independent of c;, i = k + 1,..., n 
and 1, 2,...,k — 1, respectively. This means that the joint conditional density 


functions of X¥*,..., X#_, and X#f,,,..., X* under the same single condition 
X* = c, have the form 


(k — 1)! 


PPO 9 + Me C) = GT for O< Xp Se L Xk-1 Sy, 
k 


= 0 elsewhere. (1.9) 
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and 
n — k)! 
Pren lad = ee for Ck < X41 S°°° 
k 


=0 elsewhere, (1.10) 


<x, <1, 


n 


respectively. 

Formulas (1.7) and (1.8) in conjunction with (1.9) and (1.10) also show that 
the sets of variables Xf, ..., Xf, and X},,,..., X* under the condition 
Xj} = C, are (conditionally) independent. 

More generally, the two sets of random variables 


X*,...,X* and Xý... X (i< k) 
will be conditionally independent given the values of the remaining variables 
XP Ak 
By the same reasoning we may obtain the joint density function of any 
number of consecutive order statistics. Thus, the joint conditional density 
function of X¥,..., X*, X#.,,..., X* (i < k), given Xf... = Xj41,..., XE = Xps 
is 


F* Os eet Xn) 
S*(Xi+15 ee) X) 


On the other hand, by the above assertion of independence, the left-hand side 
is also equal to 


(1.11) 


FEC a E ETE a) = 


f*(%4, sip DE Nesp eo XOS (Med 1s + peas way Xp) 
= f*(X 1,066. Xi Xie DS Xrti +s Xal Xe) 
i! (n-k)! 


E xyi a- xy 


From this expression, (1.11), and (1.5) we obtainforO <i<k<n 


f UX PAS sss Xp) xi, (1 zi x)" 


~ il(n — k)! 
for OSX Sse <s], 
=0 otherwise. (1.12) 


In particular, (1.12) with i + 1 = k gives the marginal density function of X}: 


SO) => xe ‘dW pan, x for 0 < Xk < 1, 


n! 
(k — 1)!(n — k) 
= (0) elsewhere, (1.13) 


which is the density function of a beta distribution, 
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The order statistics X*¥, X¥,..., X* partition the interval [0, 1] into n + 1 
disjoint intervals with lengths 


U, = Xt Ss A ..., U, =X*— KEG, Ugg 21 


Then U,, U,,..., Un, Un+1 are obviously not independent random variables 
since )"*} U, = 1. By executing the transformation of variables (xj, ..., x#*) 
into (u,,..., U,), we have 


u, = xt, 

u, = —xf + x3, 

U, = —x* + x*, (1.14) 
and calculating the Jacobian of this transformation, which in this case is 
identically equal to 1, we may determine the joint density function g(u,,..., Un) 


of the random variables {U,,..., U,}. Thus 
uci =n! for u20 (i=1,... 7n) ¢ yee, 


=0 otherwise. (1.15) 


We can express this by saying that the random variables {U,,..., U,} are 
“uniformly” distributed over the region 


u; = 0 Gi =1,...,n) Yu <l. 
i=1 


Equivalently this also determines the distribution of the random variables 
{U,,.. , Uns Un+1} in the region 
n+1 


u; > 0, G@=1,...,n,n + 1) Yu=l. 
i=1 


We will now demonstrate that the joint distribution of {U,, ..., Un+1} 
is the same as that of 
Y, Vo T 
S 3 eens Ss’ S e] 
where S = Y, +---+ Y, + Y,., andthe Y, i = 1,...,n,n + 1,are independent 
exponentially distributed random variables with parameter 1 > 0. This result 
can be demonstrated by defining a related Poisson process and analyzing the 
problem in these terms. For the sake of variety, we present a proof involving 
direct computation. 
For this purpose we write the joint density function of (Y;,,..., Y+1), 


n+1 


SOs Yny) = A"t! ap( -2È n) for y 20, i=1,...,n4+1, 
i=l 


= () elsewhere, 
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and make the transformation 


= yı 
v = ’ 
Vi tees + Ynti 
i= y2 i 
Yr bee + Ynti 
Yn 
Un 


Cardi = Yi ee Perr 


The inverse of this transformation is 


Vi = Vintis 
Y2 = V2 Vn+15 
Yn = UnVn+ 1s 
Yn+1 = Uneill — (vy +- + 


From this the Jacobian can be computed: 


yt ttt + Wnt 
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> 


On). 


Viet 0 0 0 
0 Un+1 0 0 
Ja 3? O Vasi : 

0 0 0 Un+1 

—Un+1 —Un+1 Unta Unt 
Un+1 0 0 0 v 
0 Un+1 0 0 U2 

0 0 Un+1 0 v3 mn 
=a S A : . y = Un+1- 

Un+1 Un 
0 0 0 0 1 
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f@; .--s Uns Un+1) = 7h exp(—Aty +1 )Un+1 
n 
foru > 0, i=1,....n+1, Yo =1, 
i=1 


=0 otherwise. 


From this representation we may infer that S and the random vector 
(Y,/S, ..., Y,/S) are independent and possess the respective marginal density 
functions 


n+1 
fnt) = n! exp( — Avy + 1)Un+1 for Un+1 20, 


=0 elsewhere, 


and 
A n 
Trn =n! for v20, i=1,...,nm, Yo Ss 


=0 elsewhere. (1.16) 
Since (1.16) agrees with (1.15) and since 


Yh Ye, Ha 
ih EN et ge 


the assertion about the equality of the distributions of 


Y Y; 
(Uis. Un+1) and & =) 


Ses 


is proved. 
2: The Ballot Problem 


We now intend to present several applications of Poisson processes and related 
order statistics to various random variables connected with empirical distri- 
bution functions. To this end, we first develop some results familiar under the 
name of the ballot problem, which are of considerable interest and utility in 
their own right. 

The ballot problem can be stated as follows: 

In a ballot where c votes are cast, candidates A and B receive a and b numbér 
of votes, respectively, a + b = c. Throughout the counting of the votes the lead 
may continually change hands. The ballot problem (in its simplest version) 
consists of the question: Assuming that a > b, what is the probability that 
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candidate A will always lead (at least by one vote) throughout the counting 
of the votes? 

A direct solution of the ballot problem runs as follows. Consider a fixed 
arrangement of the a A’s and the b B’s on a circle. For the given arrangement we 
will determine the number of starting positions from which, say, going clockwise 
one full circle, A will always lead in the counting. 

To find these positions, we delete successively all adjacent pairs AB going 
around in the circle perhaps several times. We are finally left with a — b A’s. 
A little reflection will convince oneself that the places left are exactly the starting 
positions from which A always leads. It follows that the probability that A 
always leads where the possible observation is a prescribed arrangement or one 
of its cyclic permutations is (a — b)/(a + b). This quantity is independent of the 
choice of the prescribed sequence. It follows that the probability that candidate 


A will always lead throughout the vote counting is (a — b)/(a + b). 
The above analysis is very elegant. However, since we have in mind other 


generalizations we now formulate the ballot problem in a more general context 
of drawing cards from an urn and analyze its structure. 

An urn contains a cards each labeled 0 and b cards each labeled 2;a + b = c, 
a > b. The cards are selected one by one in a random way from the urn without 
replacement until the last card is drawn. Let v; be the random variable equal 
to the number on the ith card drawn; i = 1,...,c. Then the ballot problem will 
be solved by finding 


Pr{vy, tv. + + v, <rforr =1,2,...,c}. (2.1) 


This assertion follows from the observation that if among the first r drawings 
there are a 0’s and $ 2’s (x + $ =r) then the condition v, +---+v,<r 
means that «x-0 + -2 < «æ + B, which reduces to B < «. Obviously, (2.1) 
is the probability that this inequality holds for r = 1, 2,...,¢ 

To find the probability (2.1) first notice that (v,, v2,..., Ve) is an arrangement 
of a 0’s and b 2’s and any one of the c!/a!b! possible arrangements is equally 
likely to occur. That means that for every r (r = 1,..., c) and every set i,,..., i, 
of distinct members of {1,..., c} the joint distribution of (v;,, v;,,..., v;,) is the 
same as that of (v,, ..., v,). This fact is expressed by saying that the random 


variables v,,..., v, are interchangeable. (Independent, identically distributed 
random variables are, of course, interchangeable.) Then, since 

vi +y te: +v,=a-0+b-2 = 2b, (2.2) 
we have 


X Ely] = Ely, + Vo + kag + v] = 2b. 
i=1 


Since the r.v.’s v4, v2,..., v, are interchangeable, each v; has the same marginal 
distribution and therefore 


cE[y,] = 2b or Ely] = ey for i=1,2,....¢ (2.3) 
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Using this fact we now prove by induction with respect to c that 


Pr{vy, +- +, <rforr =1,...,clvy + vg +--+: + 0, = 2b} 


(2.4) 


First we show that (2.4) holds for c = 1. But c = 1 implies that a = 1 and 
b = 0, since a > b > 0 is assumed in the statement of the problem. Then clearly 


Pr{y, < 1} = Pr{v, =O} = 1. 


The assertion of (2.4) is trivially true for 2b = c, since in this case 
vte ty =2b=c. 

Now, assume (2.4) holds for all c < n — 1 and 0 < 2b < c. We wish to 
prove that (2.4) also holds for c = n and O < 2b < c. 

Let b’ be an integer such that 0 < b’ < b; then 


Pr{y, +- +v, <rforr =1,...,clv, +--+ + voy = 2b'} 
= Pr{v, + +, <rforr = 1,...,2b|vi +--+ + vo, = 2b'} (2.5) 


because the inequality v; + --- + v, < ris always satisfied forr = 2b + 1,...,c¢ 
by the condition (2.2). But the right-hand side of (2.5) is the same type of expres- 
sion as the left-hand side of (2.4), subject to a condition of the form (2.2). In 
fact, in (2.4) just replace c by 2b and 2b by 2b’. Using the induction hypotheses 
with c and b replaced by 2b and b’, respectively, yields 


2b’ 
Pr{v +- +7, <rforr=1,...,2b]v, fost vay = 26} = 1 (2.6) 


To complete our induction proof write 


Pr{y, +--+, < rforr = kesni 


b 
Y Priv +e +, <rforr =1,...,n]vy +: + Ya = 2b'} 
b’=0 


x Pr{y, + +++ + vo, = 2b} 


b 2b’ 
y(t — =] Pr{y, +--+ + vg, = 264, (2.7) 
0 


by (2.6). But 


b 2 
> 2b’ Pr{v; +e + Vay = 2h’} = E[y, +. + Vap] = 2b 


b =() 


- 
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by (2.3). Hence from (2.7) 
Pr{y,; +e +, <rforr=1,...,n} 


b’=0 


b 1 b 
2 Propia es 20 S £ 2b' Pr{v, +--+ + vo, = 2b} 
b'=0 


l 
i 
l 
i 
| 


So (2.4) holds for c = n and 0 < 2b < n. This completes the induction proof. 


A generalization of the ballot problem consists of having the urn contain 
c cards with numbers k,,..., k, such that 


k; = nonnegative integer, A O o 
ki +e +k, =k, O<k<e. 


If v; again denotes the number on the ith drawing from the urn without replace- 
ment, then (2.4) generalizes to 


k i 
Privy, + t ¥,<rforr=1,...,c}=1—~. (2.8) 


The proof of this more general statement can be carried out by repeating almost 
verbatim the proof of (2.4). We omit the details. (The student should check 
the steps for himself.) 

A further generalization leads to the following problem. Let v;, ..., v, be 
nonnegative interchangeable random variables with 


Vy +. + Va = y (fixed number). (2.9) 


Let t,,..., 1, be the order statistics based on n independent observations each 
uniformly distributed over [0, t]. Assume further that the random variables 
v; i = 1,..., n) are independent of the variables t; (j = 1,...,n). 

Imagine the step function (see Fig. 1) 


0 if O<x<%, 
f(x) =v tee $y, if 1% <x<ti41 r=l,...,n—1, 
vy tere +y, if t<x<t. 


Then we pose the problem, What is the probability that the graph of f(x) will 
not cross the 45° line. The analytic statement of this problem with its solution is 


1-2 if O<ysu, 
(2.10) 


~ 


Priv +e +51, forr=1,...,n} = 
0 if yon 
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y F(x) 


0 T1 T2 T3 T4 ooo Tn t 


FIG. 1 


We will prove (2.10) by induction. For n = 1, 


ie 
Prin <7t,}= t 


f O<y<t, 


0 if y>t, 


since v, = y and 7, is uniformly distributed on [0, t]. 
Assume now that (2.10) holds for n — 1 in place of n. Imposing the conditions 


Veter ty1=2 (OSz<y) 
and 
T, =U O<ux<t), (2.11) 
let us compute the conditional probability 
Pr{y, +- +v <1,forr =1,...,n|vy +---+,-1 = Z, Tn = U}, (2.12) 
Assume first that y < u < t. Then by (2.9), the inequality 
ve Eeit S Ta 
will certainly prevail under the condition (2.11). Hence (2.12) equals 


Pr{vy, +--- + v, <1,forr=1,...,.n—1]v, +- + Vn-1 =Z, Tn =u} 
Z é 

1-—- f O<z<u, 
u 


0 if z>u, 


by the induction hypothesis, since (t4, ..., T,—,) under the conditions (2.11) 
are order statistics taken from n — 1 independent random variables uniformly 
distributed over [0, u]. 
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Now, remembering that v; is independent of t, for alli = 1,...,, consider 
the density function 


d 
(z) = = Privy Heee + Yn-1 SZ) 


Then fory <u <t 
Práv + +, <1,forr =1,...,n|t, = u} 


y 
= [Pro te ty <4 forr = 1,...,n|vy +--- + ¥,-1 = Z, Tp =u} 
o 


x Q(z) dz 


ani? z 1p 
f ( — IG dz = 1 — 7 f0 dz 


1 ln— 
S1- By tnd 
u u 


a (2.13) 


because (2.9) and the interchangeability of v,,..., v, imply 


y 
are for i=1,...,n. 


For u < y, however, 
Práv +- +y <T forr = 1,...,n|t, =u} = 0, (2.14) 
since under condition (2.11), 
Ype ty ST 
cannot hold. Now by (1.13) the probability density function of t,, is 


n—1 
*) I if O<uK<t, 


| 
wu) =; \ 
0 otherwise. 
Then by (2.13) and (2.14) forO <y <t 
Pr{vy, +--- +, <T forr =1,...,n} 


t 
= f Prev +---+,<1,forr=1,...,n|t, = u}y(u) du 
0 


t z n=l 
-Í pal” Di du L9. 
ý u n ° t t l 
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For y > t, obviously, 
Pr{v,; +e +v <1,forr = 1,...,n} =0. 


This proves (2.10). 
We are now prepared to develop some applications of the preceding ideas 
to order statistics. 


3: Empirical Distribution Functions 


An important class of problems connected with order statistics concerns the 
empirical cumulative distribution function of a random variable. If X is a 
random variable with distribution function F(x) and 


(Xf, X3,..., X7) 


is the set of order statistics corresponding to a sample of size n from F(x), then 
the empirical cumulative distribution function F(x) of X is a random variable 
defined as 


0 if x< Xt, 
k : 

F,(x) = ; if X$<x< Xfi, k=1,...,n—-1, (3.1) 
1 if x > Xš. 


Let X be a random variable with a continuous strictly increasing cumulative 
distribution function F(x) and empirical cumulative distribution F(x). We 
want to determine the probability 


Pr{F 0x) < yF(x) for —0 <x < oo}. (3.2) 


As we have seen at the start of this chapter there is no loss of generality in as- 
suming that X is uniformly distributed over [0, 1]. Indeed, replace X by F(X), 
whose corresponding observations become F(X,), F(X), ..., F(X,). Then 
(3.2) reduces to 


Pr{F (x) < yx for0 < x < 1}, (3.3) 


where F(x) is now the empirical distribution function associated with a uniform 
distribution. 
Figure 2 shows a typical realization of F,(x) for the uniform distribution. 
We will prove that 


0 if y<1, 


Pr{F (x) < yx for 0 < x <1} = l , 
l K if y>l. 


(84) 
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ie y =x 


7 y =F, (x) 


FIG. 2 


If y < 1, then (3.4) is obviously zero since the condition F,,(1) < y is violated. 
The result when y > 1 is a corollary of (2.10). It is only necessary to apply the 
substitution t = 1, y = 1/y. In fact, it is clear (consult Fig. 2) that the event 


F(x) < yx for O0O<x<1 


will occur if and only if 


joo! on ee ene? (3.5) 
ny 


where X} (k = 1, 2,..., n) are the order statistics of size n whose underlying 
distribution is uniform over [0, 1]. The random variables v,, v,,..., v, Should 
be identified with the fixed numbers v; = 1/ny(i= 1, ..., n), which are manifestly 
interchangeable. Then (3.5) may be rewritten as 


vi +v +: + W < XF, k=l; 2n 


where v; + va +--- +, = 1f =y. 

The validation of (3.4) is now complete by virtue of (2.10). Note that the 
right-hand side of (3.4) is independent of n. 

The following problem connected with coin tossing also involves the 
empirical cumulative distribution function. Let X be a random variable with 
continuous, strictly increasing, cumulative distribution function F(x). Let 
(X,,..., Xn) and (Y,,..., Y,) be two independent random samples of size n 
from the distribution of X. Let (X%,.... X*) and (Y*,..., Y#) be the corre- 
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sponding order statistics and form the empirical cumulative distribution 
functions 


0 if x< X¥, 
k ; 
F(x) = z if Xf<x< X#,,, k=1,... n — 1, 
1 if x > X*, 
(3.6) 
0 if y< Y%, 
l 
G,(y) = = if YfF<y< YR, L=1,...,n—-1, 
1 if y> Y*. 


We may again assume that F(x) is the uniform distribution over [0, 1]. As 
Fig. 3 shows, F,(x) = G,(x) = 0 for x < min(X#, Y#) and F,(x) = G,(x) = 1 
for x > max(X*, Y*). In the interval J = [min(X#, Y*), max(X*, Y*)] one of 
the two graphs may lie entirely below the other or, as in Fig. 3, they may equal or 
cross each other several times. We will proceed to determine the probability 
of the former event happening, that is, 


Pr{F,(x) # G,(x) on any subinterval of I}. (3.7) 


min 
5/n+ 
4/n+ 
3/n t 
2/n + 
int oo 
SN ee a te te Srey 
* * * 
oY; x Y3 Y3 YŠ XŠ alt 
* * * * * y* 
Mo oy %3 Y5 %4 n 


ria. 3 
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This probability may be interpreted as follows. Regard (X,,..., Xn, 
Y,,..., ¥,) as one sample and form the corresponding order statistics. A possible 
version of this could be 


x x * * x * * 
(YT, Y3, 1> 2° Y3, X3, YZ, Y?, Y6, XG, X%,..., X¥, Y*) 


as depicted in Fig. 3. 

Because both samples are taken from the same distribution any such arrange- 
ment is equally likely to occur. 

At any rate we denote the order statistics as 


(ZÝ, Z3,..., Z3,), (3.8) 


where half of the Z#’s are the X*’s and the other half are the Y*’s. For every k 
we may compute the ratio 


number of X¥’s < Z} 
~ number of Y¥’s < Zj 


Pr 


Clearly pan = 1. Since the graph of F,,(x) will be entirely below the graph of 
G,(x) for all x e I if and only if pp < 1forallk = 2,3,...,2n — 1, the expression 
(3.7) is equivalent to 


Pr{p, < 1 for k = 2,3,...,2n —lorp, > 1 fork = 2,3,...,2n— 1}. (3.9) 


Another graphical representation of the event in (3.9) can be described as follows. 
Consider the step function (see Fig. 4) that makes a horizontal unit jump when- 
ever we encounter an X# in the arrangement of order statistics (3.8) and a vertical 
unit jump whenever we come upon a Y¥. This step function will lie strictly above 
the 45° line (except for the end point, where it will always coincide with the 45° 


aN Who DD 


0123 4 5 n-1n 
Ha. 4 
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line) if and only if p, < 1 for all k = 2, 3,..., 2n — 1. Hence, (3.9) [and also 
(3.7)] expresses the probability that the step function of Fig. 4 will be entirely 
on one side of the 45° line (except for the end points). 

This problem can be interpreted as a coin-tossing game as follows. Suppose 
we conduct a series of 2n tosses with a fair coin. Assuming that at the end of the 
series the cumulative number of heads equals the cumulative number of tails 
(both equal to n), we may ask: What is the probability that the number of heads 
observed will always lead the number of tails observed as the game proceeds, 
or vice versa, that is, the number of tails will lead the number of heads through- 
out? 

If we make the correspondence of horizontal jumps to heads and vertical 
jumps to tails, we thereby associate with every possible outcome of the series 
a step function as in Fig. 4. Hence the probability that heads leads tails through- 
out the series of 2n tosses or vice versa, provided they are tied at the end, equals 
the quantity in (3.9). 

We will now proceed to compute this probability for which we have given 
several interpretations. For this purpose we will refer for convenience to the 
typical random step functions of Fig. 4. These step functions lead from (0, 0) 
to (n, n) in 2n steps, n of which must be vertical. These are obviously altogether 


2n . : 
( > ) such step functions. To count those among them that do not have points 


in common with the 45° line (except the end points) it is sufficient, for reasons 
of symmetry, to count those step functions that remain entirely below the 45° 
line and double their number. Every step function, however, that remains 
entirely below the 45° line goes to the point (1, 0) in the first jump and then 


2n— 1 
proceeds to (n, n). Obviously, there are ( % y ) step functions that lead to 


(n, n) from (1, 0); we want to count only those that remain entirely under the 45° 
line. To obtain this number we shall count the number of step functions that 
lead from (1, 0) to (n, n) and have at least one point in common with the 45° 


; 2n— 1 
line, then we shall subtract this number from ae We show first that every 


step function leading from (1, 0) to (n, n) that has at least one point in common 
with the 45° line and ends with a vertical jump (see Fig. 5) corresponds to a step 
function leading from (1, 0) to (n, n) crossing the 45° line and ending with a 
horizontal jump. To see this, take the step function that touches the 45° line 
and ends in a vertical jump; let (k, k) be the point where it last touches the 45° 
line before reaching (n, n). Reflect the portion of the step function between (k, k) 
and (n, n) symmetrically with respect to the 45° line (see broken lines in Fig. 5). 
This process clearly establishes a one-to-one correspondence between the two 
kinds of step functions, those hitting the 45° line and ending with a vertical 
jump and those ending with a horizontal jump after hitting the 45° line. Thus 
we only have to count the step functions leading from (1, 0) to (n, n) that cross 
the 45° line and end with a horizontal jump. But obviously every step function 
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FIG. 5 


leading from (1, 0) to (n, n) that ends with a horizontal jump must pass through 
2n — 2 ; : 
(n — 1, n), and there are exactly £ i ) such step functions. By virtue of 


the reflection principle, we infer that the number of step functions that lead from 
(1, 0) to (n, n) and cross or touch the 45° line with a final vertical step is also 


2n — 2 
j ) Then the number of step functions that lead from (1, 0) to (n, n) 


and always remain below the 45° line [except for (n, n)] is 


Cee mets) 


This is also the number of step functions that lead from (0, 0) to (n, n) and remain 
entirely below the 45° line (except at the end points). 

Hence, the probability that a step function (in Fig. 4) chosen at random 
will have no common point with the 45° line (except at the end points) is 


1 2n — 2 
2-1 ( n 1 


— m c. 3. 
2n 2n — 1 GY 
n 
This is also the probability expressed by (3.7) and (3.9). The result (3.10) can 
also be derived by a straightforward application of the ballot theorem. 
Finally, the result of (3.10) can be obtained by appealing directly to the 


results on coin tossing developed in Chapter 10. In view of the identifications 
with coin tossing we are dealing with the first Markov chain of Section 4, 
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Chapter 10. In this formulation the probability in question is precisely the 
probability that first passage from state 1 to state 0 occurs at trial 2n — 1, 
given that the cumulative number of heads equals the cumulative number of 
tails at trial 2n. 

The calculation of the probability that first passage from state 1 to state 0 
occurs at trial 2n — 1 with no further conditioning was carried out in Section 6 


2 

of Chapter 10 and shown to be (1 /(2n — 1))27 a ") Therefore, the conditional 
n 

first passage probability is manifestly f2%/u., = 1/(2n — 1), as it should be. 


4: Some Limit Distributions for Empirical Distribution Functions 


The following lemma is interesting in itself and will be used in establishing a 
limit distribution for empirical distribution functions. 


Lemma 4.1. Let x = (x,,...,X,) be a vector in n-dimensional Euclidean space. 
Assume that gS 


Xp +X +--+ +x, =0 


and that X }-i41 X #0 fort <i<j<n. 


Let Xpan = Xp and x(k) = (Xps Xk+1s <--> Xns -+--> Xn+p—1) DE a cyclic per- 
mutation of the components of x for k = 1,2,...,n. Then for eachr = 0,1,..., 
n — 1 exactly one of the x(k), k = 1, ..., n, is such that r among the successive 


partial sums of its components are positive. 


Proof. Let s,=x, +x, +- + X, for k = 1, ..., n and sy = Q, Sk+n = Sk 
for k = 0, 1,..., n. Then the sp, k = 1,..., n, are distinct, since if we had 


Si = S; 


; for 1<i<jx<n, 


then xX, +--+ X= Xp te HMA Ka tee Hy, LO, Xii te +X; 
= 0, which contradicts our assumption. Clearly, the successive partial sums of 
the components of x(k) are 


Sk — Sk-19 Skt1 T Sk-1s -++ Sa T Sk-1s Santa T Sk-1s 


(4.1) 
Sk+n-1 — Sk-1: 
This sequence is identical with the sequence 
Sk T Sk- as Skt T Sk-t> ees Sn T Sk-is St T Sk-is +++ Sk-1 T Sk-1: 
(4.2) 


Now let 
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be the unique relabeling of s4, S2,- --, Sn for which 
st > s$ >--- > s*. 


Such a relabeling can be achieved since the s, (k = 1,...,) are distinct. 
The number of positive members of the sequence (4.2) (which is the same as 
(4.1)) will then be the same as the number of positive members in the sequence 


* * 
SŤ — Sk-1> SŽ — Sk- e” BS Sk-is (4.3) 


as this is simply a rearrangement of (4.2). Now for any r = 0, 1,...,n— 1 
there will be exactly one k (1 < k < n) such that exactly r members in (4.3) 
(and hence also in (4.1)) will be positive. To see this just choose k such that 


— ş* 
Sk-1 = Sp+i- 


Then s¥ — sk-1 > 0, s¥ — Sk-1 > 0,..., SŽ — Sk-1 > 0, but s*,_, — Sk-1 = 0, 
s*¥,— Sk-1 <0,...,s* — S-1 <0. E 


This lemma has a geometric setting. Omit the assumption s, = 0. Plot the 
points (0, 0), (1, sı), ..., (n, s,) in a two-dimensional Cartesian coordinate 
system. Join the neighboring points with straight lines as in Fig. 6. The resulting 
broken line is called the sum polygon of the vector x = (x,,..., X,,). In quite the 
same way we can obtain sum polygons for the vector x(k) whose components are 
a cyclic permutation of those of x. The straight line segment joining the points 
(0, 0) and (n, s,) is called the chord of the sum polygon. Consider the point of 
intersection P of this chord with the vertical line through (k, s,). The ordinate 


(n, Sp) 


ria. 6 
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of P is equal to (k/n)s, from elementary geometric considerations. Hence, the 
vertical distance from the vertex (k, s,) to the chord of the polygon is equal to 
Sk — (k/n)s,- 

Now the vector whose components are 


1 1 1 
X1 — —SyyXzq — —Syy-+->X_ — — Sy 
n n n 


clearly satisfies the conditions of the lemma including the requirement that the 
nth partial sum vanish. The conclusion of the theorem asserts that among the n 
cyclic permutations of the sum polygons of x there is for eachr = 0,1,...,n — 1 
exactly one which will have r of its vertices above its chord. Particularly, the 
one for r = 0 is obtained by the cyclic permutation starting with (1 + the index 
k where max,[s, — ks,/n] is achieved). The case of r = n — 1 corresponds to 
(1 + the index k where min,(s, — (k/n)s,) is attained), etc. 

We close this chapter with an application of Lemma 4.1 to the analysis of 
certain r.v.’s connected with empirical distribution functions. Let, as usual, 
F,(x) denote the empirical distribution function of a sample of size n from the 
uniform distribution on (0, 1). We introduce the two random variables U, 
and V,: 


(44) 


U.= cumulative length of all segments 
n ~ )of x values for which F,(x) > x 


(see Fig. 7). In the particular realization depicted above, U,, equals the sum of all 
the darkened line segments. 


F,,(x) 


Ha. 7 


122 13. ORDER STATISTICS AND POISSON PROCESSES 


We define 


O<x<1 


V, = TEO —x= max [F,(x) — a} (4.5) 


The quantity maxo <x<1[F,(x) — x] = D}, commonly called the one-sided 
Kolmogorov-Smirnov statistic, is an important function of the observations 
used in performing statistical tests to decide whether the observed sample 
comes from an underlying uniform distribution. 

Our objective in this section is to determine the distribution law of U,, and 
V,. The remarkable result is that for all n each is uniformly distributed over [0, 1]. 

To prove this assertion we consider a Poisson process X(t), 0 <t < 1, 
with parameter 1. Now divide the interval (0, 1) into r + 1 parts 


0 1 r 2 r 1 
rt 1 Nr t+ Vrtg iP OU? Vet F 


where r + 1 isa prime number greater than n. (The reason for this specification 
will be clear later.) The increments 


1 2 1 r 
a a i ha 


are independent and identically distributed Poisson random variables. We 
denote these increments by W,, W,,..., W, 4 1, respectively, and define 


n 


Y, Sas 
r+1 


g= i i=1l,...,r4+ 1], 
which are obviously independent and identically distributed. We form the 


sequence of partial sums 
S= Y tet Ys k = 1,2,...,r + 1, 
and note that 
Pr{S; = 0} =0 (i = 1,2,...,7). (4.6) 
This is true because S; = 0 implies (r + 1)X(i/(r + 1)) = ni. But this cannot 
hold owing to the hypothesis that r + 1 > n, i and does not divide n or i since 
r + 1 is prime. Hence (4.6) is valid. 

A similar argument shows that, for any j > i, Pr{S; — S; = 0} = 0. This 
means that with probability 1 none of the partial sums S; —S;(0<i<j<r+l, 
So = 0) is zero. 

Let 


N, = number of positive terms in the sequence S,, S,,....5,, 


L, = smallest j for which $} = max(0, 5,,..., 8,). 
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The hypotheses of Lemma 4.1 are fulfilled for the sequence x; = S; — S;_,, 
i = 1,...,r, whenever no S; — S;(j > i) is zero and S,, , = 0. This event occurs 
with probability 1. According to the lemma 


1 
Pr{L, = m|S,41 = 0} = Pr{N, = m|S,4,=0}=— > (m=O, 1,..., 7). 
(4.7) 


Now under the condition X(1) = n, X(t)/nis distributed like F,(t),0 < t < 1. 
(This is the identification made in Section 1.) Hence, we can define U,,, V, for 
X(t)/n, 0 < t < 1, under the condition X(1) = n. We claim that 


N, 
r+1 


A 
r+ 


L, 
r+1 


B 
r+’ 


(4.8) 


n n 


where A and B are constants which depend on n but not on r. 
A complete justification of the first inequality of (4.8) goes as follows. If 


Sear = XE + Dir + DJ — nk + Dr + 1) > 0, 


and if no jump occurs in the interval (k/(r + 1), (k + 1)/(r + 1), then certainly 
for the whole segment k/(r + 1) < t < (k + 1)/(r + 1), we have 


X(t) — nt > 0. (4.9) 


Since X(t) has precisely n jumps under the condition X(1) = n there are 
at most n intervals each of length not exceeding 1/(r + 1) for which S; > 0 
does not imply that 


i i+1 
X(t) — nt > 0, mi Pal 
holds. But N,/(r + 1) is equal to the number of positive S; multiplied by a length 
1/(r + 1). In view of the preceding analysis we may conclude that N,/(r + 1), 
can differ from U, by a quantity no larger than n/(r + 1). Thus A < n. A similar 
argument leads to the second relation of (4.8). Thus under the condition X(1) = 
n, both absolute values in (4.8) converge in probability to zero as r > œ. Since 
N,/(r + 1) and L,/(r + 1) are asymptotically uniformly distributed over (0, 1) 
as r > œ (r + 1 tends to œ through prime values), this completes the proof of 
the following theorem. 


Theorem 4.1. Let F(x) denote the empirical distribution function based on a 
sample of size n from the uniform distribution on [0, 1]. Consider the random 
variables U,, and V, defined in (4.4) and (4.5). Then 


Pr{U, < x} = Pr{V, s x} = x (Osx <1). 


124 13. ORDER STATISTICS AND POISSON PROCESSES 


Elementary Problems 


1. Let X,, X2,..., X, be a sequence of independent, identically distributed random 
variables with continuous distribution function F(x). Denote by X,, the kth smallest 
among the variables 


DONE. Coens, oe 
Thus 
Xni SX Se S Xm. 
Find the distribution function F,,,(x) of X pk- 


Solution: 


F(x) = 5 () [FOTE — FJ". 


l=k 


2. Using the notation of Problem 1, show that 


F(x) = K(i) 


Hint: Integrate by parts. 


1 
Í ei — At dt. 


1- F(x) 


3. Using the notation of Problem 1, find 
Pr{X nk > y, Xn+1,k < x} for x< y. 


Solution: 
(e Jort - Foy", 


4. Let X¥,i = 1,2,3,...,n, be the order statistics from a uniform (0, 1) distribution. Show 
that log X% has the same distribution as — "=x 0;/j, where the 0; are independent random 
variables with the exponential distribution with parameter 1. 


Hint: Use the relationship between Poisson processes and order statistics of the uniform 
distribution. 


5. Prove that (X*/X#,,)',i = 1,2,..., n, where X*, , is defined to be 1, are independent 
uniform (0, 1) random variables, where X7,..., X* are order statistics corresponding to a 
sample of size n from the uniform distribution. 


6. Let Xo, X1, X2,... be independent identically distributed positive random variables 
having the continuous distribution function F(x). Let N be the value of the first subscript n 
for which X, > Xo. Find Pr{N = k} for k = 1, 2, ... and E[N]. 


(Meaning of the problem: Xo, Xi... are offers for the car you are trying to sell. E[N] 
is the average time you must wait before you receive an offer better than your first offer.) 
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Problems 


1. If X,, X,..., X, are independent observations from the uniform (0, 1) distribution 
find the distribution of P = []}-, X; 


Hint: Either calculate directly or introduce the new variables 


X; = exp(— Yj), i=1,2,...,n 


resa ff [o [ Ae a 


— j t(log t)" dt. 


Answer: 


2. Let X,, X2,..., Xp be independent identically distributed random variables with 
distribution function F(x) and density function f(x). Let X¥ < X$ <... < Xf be the 
corresponding order statistics. Let Y,, Y,,... be a sequence of independent, identically 
distributed random variables from the same distribution as the {X;}. Define N as the 
integer-valued random variable for which 

Y, < X*, i=1,2,...,.N—1, but Y> Xž. 


Similarly, define M as the random variable for which 


¥<X#,, i=1,2...,M—1, but Yy > Xš.. 


Find the distributions of N and M, respectively. 
Answer: 


k 
(ntkn+k—1) 

2k(k — 1) 
at k(n+k-—1)(n +k- 2) 


Pr{N =n} = 


Pr{M =n} = 


Problems 3-10 are based on the following structure. 


Let X,,..., X,—, be independent random variables each uniformly distributed over 
the interval (0, 1). Let 


Xt<Xt<-.-<X*, 


be the ordered values of the X;. These order statistics partition the unit interval into n 
disjoint intervals whose lengths are 


U,= X*, U,=X!—X*%, ..., U,=1—-X*4. 
Let 
Ut < Uf <. < U* 


denote the ordered values of the U;. 
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It is implicitly established in the text that 
(a) The random vector 


Yy n Y, 
N 7 


where Y, are independent r.v.’s exponentially distributed with parameter 1 and the r.v.’s 
S, = Y, +---+ Y,, are independent. 
(b) The random vectors (U,,..., U,,) and 


are identically distributed. 
(c) IfS is a random variable distributed as S„ and independent of (U,, U,,..., Un), 
then the random vectors (SU,, SU,,..., SU,) and (%,..., Y,) are identically distributed. 


3. Compute E[U?U%-.. U'r], where i}, i,,..., i, are nonnegative integers. 
Answer: 
nliti! 


mitti)! 


4. Prove that 


Xi X} X% 
XP XP | 
are independent random variables. 
Hint: Consult Elementary Problem S. 
5. LetS,= Y, +--- + Yp. Show that 
So S20 Se 
S, 8, D “S, 
are independent random variables. 
6. Ifi,j, k, and | are distinct indices show that 
U; U, 
uy U, 
are independent random variables. 
7. Find the distribution of 
U+ +U, 


Uis eni Ue 


Answer: Ratio of two independent r.v,'s each following a gamma distribution, 
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8. Show that 
F(t,,...,¢t,) = Pr{U, > ti, Uz > to,...,U, > th} 


1 if t <0, 
=4 (1 — t)” if O<t<1, 
0 if t> 1, 


where t =f, +--+. 


9. Let Y* < Yž <--- < Yž be the ordered values of ¥,,..., ¥,. Find the joint distribution 
of 


Zy=nYt, Z,=(n— (YF -—Y), ..., Za = (Vr — Yr). 
Hint: Cf. Elementary Problem 5. 
Answer: Independent exponential random variables with parameter 1. 


10. Show that [U,, U2, ..., U,] and [nU#, (n — 1)(U$ — Uf), ..., UX — Ux_,] have 
the same joint distribution. 


11. Using the notation of Elementary Problem .1, we define the rank of X; in the set 
X,,X2,...,X, to ber if X; = X,,. Now, let R; be the rank of X; in the set X,, X2,..., Xj. 
Show that the random variables R,,R,,...,R,, are independent and 


1 
Pr{R, =r} =- for r=1,2,...,n. 
n 


Hint: Prove that 
Pr{R, = ri, R2 = 12,-..,R, = ra} er 


for all choices ry = 1; ra= 1,2; rz =1,2,3; ...5 7, = 1,2,...,0. 


12. Let X,, X,,... be independent, identically distributed, positive random variables 
with continuous distribution function F(x). Prove that 


1 
Pr{X, > max(X,, X2,..., Xk- = T 
13. Let X,, X,,... be independent identically distributed r.v.’s and define 
M,, = max(X,, X2,..., Xn) 
Let N,(A, w)(0 < A < u) be the number of terms in the sequence 
Xtnay> Xina] + to Xina] +2> sig X ina» 


where X, = M,. Here, as customary, [nå] denotes the greatest integer less than or equal 
to nA and [nu] has an analogous connotation. Show that the probability generating 


function of N,(A, u) is 
[un] 1— 
Ef") = exp f 3 os| FN: 2p, 
iol 


i= làn] 
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Hint: Introduce the r.v.’s 


1 if X;=M, 
C 5 = t > 
© ‘3 otherwise. 


Use Problem 11 to deduce that C(i) are independent r.v.’s and then calculate 
[np] i 
JI E[z©}. 


i=[nd] 


14. (continuation). Show that the limit distribution of N,(A, u) as n —> co is a Poisson 
law with parameter log(/A). 


15. Let W,,,, = the index of the last maximum among the sequence X,, X5,..., Xing: 
Prove that W,_,,,/nu has a limit distribution as n > oo that is uniform on (0, 1). 


16. Let W,, W2, ... be r.v.’s independent and uniformly distributed on (0, 1). Let N be 
the number of indices i satisfying t < []}-, W, < 1, where t is a fixed number between 0 
and 1. Find the distribution function of N. 


Answer: N follows a Poisson law with parameter — log t. 


17. Let X,, X2, ..., X, be n observations from a continuous distribution function 
F(x). Let N, denote the number of indices k for which X, > max(X,, X,..., Xx—4), 
k = 1,2,...,n. Show that the generating function of N, is 
D J (s+n—1)(st+n—2)---(s + Is 

f ; 


Ș PriN, =r} = 


r=1 n! 
By convention, X, satisfies the requirement of N, = 1. 
18. In the previous problem, establish the relation 
E[N 
aNd 
n> logn 


19. Consider a sequence X,, X2, ... of independent, identically distributed r.v.’s with 
continuous distribution F(x). A value X, is called “outstanding” if \ 


X, > max(X,, X,..., Xk-1) 


(By convention X, is outstanding.) 
Let N, be the number of outstanding values among the sequence X,, X3,..., Xn- 
Let V, be the index of the kth outstanding value. 
Prove the relation 


Privy, > n} = Pr{N, <k + 1}. 
20. (continuation). Prove the formula 
Pri } 1 1 l 1 
r{N, =r} =- Ee ase aot 7 
N ekyeky £= kp yon (n cart k,-4) (n — k, 2) (n — ki) 


21. Let Xa Xas... Xp and Y,, ¥2,..., Y, be collections of m and n independent t.v.'s 
with the same continuous distribution function F(x). Let N be the number of Y, in the 
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second sample not exceeding Xj, the kth-order statistic from the first sample. Show that 
the probability density function of N is 
m\ (n 
(C) 
A a E E ET for l=0,1,...,n. 
y + ) 
Hint: Use the result of Problem 2. 


22. Consider a random sample of size n taken from a population with continuous density 
function f(x). Take another random sample independent of the first one and also of size n 
from the same population. Let U” be the random variable associated with the number of 
values in the second sample which exceed the rth smallest value in the first sample. Similarly, 
let V” be the random variable associated with the number of values in the second sample 
which exceed the sth largest value in the first sample. 


Prove that 
n—-x+r—1\(n—r+x 
r-i x 


Pr{U" = x} = 


2( 3-1) for x =0,1,..., 

n-r+x 

and Pr{U? = x} = Pr{V? = x} wheres = n — r. 

Hint: Use the result of Problem 2. 

23. Let X,, X2, .--, Xm and Y,, Y2, ..., Y, be two independent random samples from 


distributions F(x) and G(y), respectively. Assume that F and G are continuous distribution 
functions and : 


F= 0, ô > 0. 
Let 
Kin < Xam Sens Xinm 
and 
Ynse Y, 
be the order statistics of the corresponding samples. 
Find the probability 
Pr{Xm-im < Nuon < Xm-i+1,ml Yan = t} 
Answer: 


m 
i 


( Jan ue a [G(oOy’y. 
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24. Under the conditions of Problem 23 find the probability that exactly i of the X’s 
are larger than all of the Y’s. j 


Answer : 


‘a nT((n/6) +m — DTG + 1) 
iJ’ Tnj+m+h ` 


25. Under the conditions of Problem 23 find the probability that exactly i of the X’s 
are smaller than all of the Y’s. 


Answer: 


m\ mo! m—i 1 
(eZ o( r ) een Hr TIN 


26. Let X,,X2,...,X,,and Y,, Y,,..., Y, be two independent random samples from the 
distributions F(x) and G(y), respectively. Assume that F(x) and G(y) are strictly increasing 
and continuous and let 


p= Prt{Y < X}, 
where X and Y are independent and distributed according to F and G, respectively. Define 
U = number of pairs (X;, Y;) such that Y; < X;. 
Show that 
E(U] = mnp. 
Hint: Write 
1 if Y,< X;, 


U= J U;,, here U; = 
ds eae Y ‘0 otherwise. 


j=1,i=1 
27. In the preceding problem show that 
Var(U) = mn{(m — 1a + (n — DB — (m + n — 1p? + (2m — Lp — (m — 1)}, 


where 
1 1 
“= Í [L] dt and p=1- 2{ tL(t) dt, 
o 0 


for L(t) = F(G~ '(t)) and G`! the inverse function of G. 


28. Let €,,&,..., Č, ben independent random variables uniformly distributed on [0, t]. 
Find the expectation and the variance of their minimum. 


Answer: Expectation = t/(n + 1); variance = nt?/[(n + 1)?(n + 2)]. 


29. Let X,, X2, ..., X, be independent r.v’s, possessing the density functions 
SiC), fox), ..., f(x) and the corresponding cumulative distribution functions F,(x), 
F,(x),..., F,(x) respectively. 
Define 
Z = min(X,, X>,..., X,) 


and 


N = first index where min(X ,,..., Xp) is achieved. 
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Let 
H(z) = Pr{N =k, Z < 2}. 
Prove that 
He) = | fy [1 - aor ico dt. 
=œ \j#k 

30. Players are all of equal skill and in a contest the probability is 4 that a specified 
one of the two contestants will be the victor. A group of 2” players are paired off against 
each other at random. The 2”~! winners are again paired off randomly, and so on, until 


one winner remains. If at the start we designate two players as A and B, what is the 
probability that they will ever contest each other? 


Answer: Pa = ÈE. 


31. Players A and B match pennies N times in a fair game. What is the probability of never 
being even? 


Answer: 


Pr(no tie) = 


| 
(remem 


32. Let X,, X,,... be a sequence of random variables with X, uniformly distributed over 
the interval (X,_,, 1), Xo = 0. Prove that 


[EEx] = [E - 4. 
i=1 i=1 
Hint: Prove by induction that the density of X; is 
[—log(1 — x] 
(i— 1)! f 


33. Under the conditions of Problem 32, show that —)%., E[log X;] is uniformly 
bounded in n. 


i= 2 


SQ) = 


Hint: Verify that 


n 


-  Bllog X:] 


i=1 


n-1 i 


— Í log(1 — ee"? F 2 do 
0 i=o Í! 


00 


— [osc —e*)do= ¥ 
0 


mak? 


lA 


34. Under the conditions of Problem 32 show that 
e|] XilXk-1 = e] = el, 
i=k 


Hint: Let p(é) = EE) [a Xi] Xk 1 = $]. (Why is p independent of k?) Use the result of 
Problem 33 to show that ø(¢) 4 Ofor0 sč <l 
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Next derive the functional equation 


À B 
o= f Oad =a, 

and solve. There are two solutions, one of which is continuous at 1, the other not. It has to 

be proved that ø(¢) = 0 (0 < € < 1), (1) = 1 is not the desired solution. 


35. Under the conditions of Problem 32 show that E[][%, Xi] + [[£; ELX] and find 
which is larger. 


Answer: 
eli] x,| > [ELX 
1 1 


*36. If n points are chosen “at random” (i.e., according to the uniform distribution) 
on a line of length L, show that if0 < d < L/(n — 1), the probability is {[L — (n — 1)d]/L}" 
that no two points will be closer together than the distance d. 


Hint: Establish and interpret the relation 
! x2—d x3—d x4-d Xyn—d L L oat = 1 d n 
val dx, f dx Í dx, f dXn—1 f A E 
L 0 d 2d (n— 2)d (n—1)d L 


*37. Suppose n points are chosen independently according to a uniform distribution on 
the circumference of a circle. Show that the probability that a specified j of the n arcs 
determined by consecutive points will be longer than « is 


jot n-1 f 
(:-*) if O<ja<t, 
nj = t 
0 otherwise, 
where t is the circumference of the circle. 


Hint: Because of circular symmetry select any one of the n points as the origin and 
assume that the other n — 1 points were chosen at random independently in the interval 
[0, t]. Let : 


O< X¥< X¥<---< X* <t 


be the order statistics that mark the distances from the origin of the first, second, ..., 
(n — 1)th point, respectively. Use the method of Problem 16. 


*38. In Problem 37, show that the probability that exactly k of the n arcs between con- 
secutive points will be longer than « is 


n\ tal Pere (cae je 
a Para | a 


Hint: Let V, be the probability that a given k arcs are each longer than « and the remaining 
n — k arcs are shorter. Note that 


r= (i). k=0,1,...,4. 
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Next establish the formula 


a _,(n—k 
%= (1Y "le 
jHk 
n 
(recan that ( ) = 0 forr > n) 
r 


*39. Let nimpulses arrive at a counter at times f,,...,¢, where t,,t,,...,¢,, are distributed 
as order statistics from a uniform (0, 1) distribution. Following each time the counter 
registers an impulse, it has a “dead time,” of length qt, in which it will not register, even if it 
receives an impulse. The inverval (0, t) is also a “dead time.” Find the probability that the 
counter registers the first k impulses which it receives (i.e., ti — t;_, 2 T, i = 1,2,...,k, 
to = 0). 


Hint: Use the method of Problem 36. 
Answer: 
(1 — kt)’, kt <1; 
0, kt > 1. 
*40 (continuation). In Problem 39, find the density f(y) of the r.v. Y, which is the time 


at which the nth particle arrives, given that the counter registers the first k particles it 
receives (n > k). 


Answer: 


n(y — kt)" ! 


f(y) = (= key 


for kt<y<l. 


*41. Let Z; = (X;, Y), i=1,..., n, denote n pairs of real random variables all inde- 
pendently and identically distributed with continuous distribution functions F(x). A vector 
Zi = (X;, Y) is said to be admissible if there exists no other Z; = (X;, Y,)for which X; => X;, 


t 


Y, > Y,. Let I, denote the number of admissible vectors in the sequence Z4, Z,,..., Zn- 
Without loss of generality assume that the Z; are labeled in such a way that 


X SX are OF 


Prove that 


where U; are independent random variables defined as 


U = | if the rank Y, in the set Y, ¥4,,...,¥%, isn -—i+ 1, 
' lo otherwise. 


Hint: Use the result of Problem 11. 
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*42. Under the conditions of Problem 41, find the probability generating function g(t) 
of I, (cf. Problem 17). Š 


Answer: 
1I(t+n) 
t) = — 7 
9) = TO 
*43. Let X,, X2, ... be a sequence of independent, identically distributed random 


variables with distribution function F(x) and density function f(x). Let n, be the integer- 
valued random variable, which is the first subscript such that X,, > X,, then let n, be the 
first subscript >n, with X,, > X,,; in general, let n, be the first subscript >n,_, with 
Xn > Xn Find the distribution of X,,,. 


Nr-1" 


Hint: Condition on the values of X,, X,,,, X X,,_, and thereby prove the formula 


ny mtoto n-i; 


i I (%,-1) dx, 


Pr{X,, < z} = f Tada T 


= 


opo ean a [ei 
— 00 1 — F(x,_2) -œ 1 — F(xo) 


Then evaluate it. Alternatively, employ induction on r. 


Solution: The density function of X,, is 


aaee FON yz 0. 


*44. Let X,, X2, ... be a sequence of independent, identically distributed, random 
variables with distribution function F(x) and density function f(x). Let n; be the r.v.’s as 
defined in Problem 43. Find the distribution of the random variable n, — n,- ,. 


Answer: 


5 7 s -F isk A di 
r{n, tegen (= ) k (k + Da 


Problems 45-49 are variations on the ballot problems. 


*45. As a modification of the ballot problem in the text, find the probability that A never 
falls behind during the vote counting. 


Hint: Consider the more general setup of c cards in an urn. The cards are labeled with 
numbers k,, k2, -.., ke- Let no, ni, na, ... be the number of zeros, ones, twos, ... among 
ky, k2, ..., ke. Suppose ky +--- +k, =k (0 < k <c). Let v; (i = 1, 2,..., c) be the r.v. 
whose value is equal to the number on the ith card drawn. Drawings are made one at a 
time without replacements. We introduce the notation 


Pie kino Aye) = Prey tee ty cet florr = 1,2,...,¢}. 
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By inserting an extra card labeled 0 into the urn develop the recursion formula 


no + 1 


Pj- (c + 1, k; no + 1, n, m,...) = ggg POE rono.) 


= Mi : 
+ 2 ca Pini k — i; no + 1,14, ---5 ni-i Nay, ++.) 
With the method of the text, where nọ = a, n = b (a + b = c), show that the probability 
that A never falls behind during the vote counting is precisely 
Pr{vy,y +- +v, <r + 1 forr = 1,2,...,c}. 
Answer: (a—b+ 1)/(a + 1). 


*46. Consider an urn containing a white and b black balls (a > b). We draw all the balls 
at random without replacement one after the other. Simultaneously we perform a random 
walk on the nonnegative integers. We start at the point 1 and move to the right or left by 
one unit, if the ball just drawn is white or black, respectively. What is the probability that 
state 0 will be reached at some time during the a + b drawings. 


Answer: b/(a + 1). 
*47. Let aand b be positive integers satisfying a > b. Prove the following identities: 
b bet 2k +1 
i) —— 2k + 1)7! 
o het) 


ala — 1)(a — 2) --- (a — k + 1)b(b — 1): - (b — k) 
(a + b)(a +b — 1)--- (a + b — 2k) 


_ b 

a+’ 

-b è ae—1)--@—r4 Dbb-— De (br + 1) [2r 
Ta a e I Ur (”” 


3 a—b b 
at+b—2r a+1' 


Hint: (i) Consider the setup of Problem 46. Break up the event that a visit to the origin 
occurs at some time in terms of the first time a visit to the origin occurs. 
(ii) This is derived by considering the last time of equality of white and black draws. 
The results of the ballot problem, Problem 46, and the calculation of formula (6.12) 
of Chapter 10 are to be used. 


*48. In the setup of Problem 46, prove that the probability that a visit to the origin is 


b(b — 1)---(b =n + 1) 


Paa b) Ga 2 aw i 


starting the random walk at the point n. Here n is a positive integer not exceeding b. 


Hin: Find a recursion formula for p,(a, b). Then use induction on n coupled with the result 
of Problem 34 that p,(a, b) = b/(a + 1). 
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*49, Let F,(x) be the empirical cumulative distribution function of n observations from 
the uniform distribution on (0, 1). Compute the probability 


P,(a, y) = Pr{F,(x) < a + yx for all0 < x < 1}, 
where (n — 1)/n <a < 1,y>0,y+a>1. 
Answer: 1 — ((1 — ayyy. 


*50. Under the same conditions as in Problem 49, except that a < (n — 1)/n, prove the 
relation 


1 


n n 
P,(a, y) = Í Pie (4 a, it) dt. 
(aly n—-1lon-1 
Hint: Condition on the largest observation. 


*51. Let 


3 n\({n—i ai n—-i_a\|'(y+a-1 
caso (NSP ey 


k 
P,(a, y) =1- 2 C,(a, Vo i), 


Show that 


where k is the integer defined by 


k+1 
-<l-a< * 
n n 


Hint: Consider the complementary event that F,(x) > a + yx occurs for some x. This 
happens if and only if 


n—-r a n-r 
r( s) > for some r= 1,2,...,k. 
ny y n 


Introduce the event A; that the index r where the inequality 


first occurs is r = i (i = 0, 1,..., k). Compute S*_ 9 Pr{A;}. 
An alternative method is to use the result of Problem 49 and induction on n. 


*52. Consider the ballot problem where candidate A scores a votes and candidate B 
scores b votes (a > b). Suppose the ballots are drawn one at a time and denote by a, and 
B, the number of votes recorded for A and B, respectively, among the first r votes. Let 
Aa,» be the number of times that A strictly leads, i.e., A, p =the number of subscripts r 
satisfying the condition «, > f, (r = 1,2,...,a + b). Let Až, be the number of subscripts 
satisfying «, 2 f, (r = 1,2,....4 + b). 
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Prove that 
Pr{A,,=a—b+r} 


fC a 


if a>b+1, 
a+b\ yao (b+m4+ D(a-—b+m-—1) 
a a 
1 
if =b +1, 
2 +1 nigra 


and 
Pr{At, =a—b +7} 
ney Gea 
a-b+1 VWa b-m-1 m 
Gs s b-m(a-b+m+1) ’ 


a 

a—b+1 r= 2, 
a+1 

0, otherwise. 


NOTES 


The point of view adopted in this chapter concerning order statistics and Poisson processes 
follows Rényi [1]. 

The combinatorial methods of the last half of this chapter are mostly based on work 
of L. Takács which has not appeared in book form yet. 

Substantial discussions on order statistics from the point of view of classical statistics 
can be found in Wilks [2]. See also references therein. 


A recent summary of the asymptotic behavior of order statistics is included in the book 
by Galambos [3]. 
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Chapter 14 


CONTINUOUS TIME MARKOV CHAINS 


The purpose of this chapter is to introduce the reader to some of the more advanced 
topics and concepts which arise in the study of the theory of continuous parameter 
Markov chains. As before we shall consider only the case of stationary transition 
probabilities. 


1: Differentiability Properties of Transition Probabilities 


Let X, be a continuous time discrete state Markov process with transition 
probability matrix || P;(‘)||®;-0. Thus 


Pr{X(t + s) = j|X(s) = i} = P;,(t). 


In addition to the usual assumptions on the transition matrix function || P;,(t) ||, 
i.e., 


(a) Pi(t)h20, t>0, 

(b) È Pit) = 1, t >0, 
TE 

(c) X P(t)P,(h) = Pit + h), t, h > 0, 
k 

we also assume that the P;(t) are continuous for t > 0 and that 

. 1 if i=j, 

oe ‘0 if iFj 


(see also Problem 3). Such a transition matrix function is often called “standard.” 
It turns out that conditions (a)-(d) imply a great deal more than might be 
expected. One of these results is that the P,(t) are differentiable for every t 2 0. 
We will prove here only the much simpler result that the P,,(t) are differentiable 
(i.c., have right-hand derivatives) at ¢ = 0. We begin with P,,(t). 


1. DIFFERENTIABILITY PROPERTIES 139 
Theorem 1.1. For every i 


—P/(0) = lim 


t>0 


1 — P;{t) 
t 


exists but may be infinite. 


Proof. First we show that P;,(t) > 0 for all t > 0. In fact, (d) asserts that for 
each i there is e > 0 such that 0 < t < £ implies P;,(t) > 0. Now (c) can easily 
be iterated to give 
P(t, +++ + t,) = > Pix, (ti)Prir(t2) «+ +> Phen, j(tn)- (1.1) 
i 


Leese Kn -1 


Letting t; =---=t, = t/n, i = j, and taking only that term on the right 
corresponding to ky.= kọ = <+- =k,_, = i, we obtain 
Palt) = [Pit/n)]”. (1.2) 


For n sufficiently large t/n < e; hence Pj(t/n) >0 and so P(t) > 0. Let 
—log P;(t) = p(t), this being well defined since P;,(t) > 0. The inequality 


P(t + s) > Pi€t)Pils) 


holds, as can be proved in the same manner as (1.2). Taking the logarithm on 
both sides yields the subadditivity inequality for @(t): 


P(t + s) < P(t) + G(s). 
Also g(t) => 0 since 0 < P(t) < 1. We put 


t 
qı = sup ©, 
t>0 


that @(to)/to = q; — £. For each t, we write tọ = nt + 6, where 0 < ô < t. Then 


Plto) < ont) + o8) < O(n — 1)t) + Gt) + oô) <--- < nott) + oô), 


and so 


then 0 < q; < © since g(t) > 0 for t > 0. If q; < œ, there exists tọ > 0 such 


pa PE) POD EAO: A OO pO 


qi — 


to to to t to 
Hence 
. t ô 
a-e < im [Z2 2] 
to [to t to 


But as t > 0, nt/to > 1 and (ô) > 0 (since P;(ô) > 1 as 6 > 0); hence 
lim kK ot) 22] = lim 2, 


to Llo t 0 imo l 
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Now by the definition of q;, 


ime < qi. 
too t 


Combining the last three inequalities, we have 
qi — £ < lim — eW) < lim — PD < qi. 
ro t t=>0 
Since ¢ was arbitrary, we have 
ime! PO) _ m 
t0 t tao t 


ie 


If q; = 00, we can replace q; — £ by M, an arbitrarily large constant, and then 
obtain 


M < lim +— ptt ). thus = lim —— py: 
t-0 t t-0 

In either case we have 

; t 

lim Po = qi 

wo t 
Now 

1 — P; 1 — ee) 


o t wo OC t T 
Theorem 1.2. For every i and j,i $ j, 


P; (0) = lim —— 


t>0 


P;,(t) 
t 


exists and is finite. 
Proof. For each fixed h > 0, || P;;(h)|| is a transition probability matrix of a 
Markov chain {X,,,}; clearly P}(h) = P;(nh). We now define ;P9(h) = 1 and 
jPHA) = Pr{ Xn =i; Xn Aj, 1 <0 <n|Xo = i}, 
filh) = Pr{X nn =j; Xon # Js 1 sv< n|Xo = i}. 
Then 


P,(nh) > pi PAP (hP y(n — v — 1h] (1.3) 


since each term on the right corresponds to a possible way of going from i to j 
in n steps (relative to || P;(h)||) and these paths are mutually exclusive but not 
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necessarily exhaustive. The term ; P?(h)P;(h) is the probability of the event that 
the last visit to i before visiting j occurs at trial v. (The relation (1.3) also appeared 
in our discussion of ratio theorems in Chapter 11.) Furthermore 


Pi{vh) = ;Pi(h) + A STA)P jlo — m)h] 


for similar reasons. The first term is the probability of visiting i at trial v without 
entering state j in the intervening trials. The sum terms consider the cases where 
state j is entered at some intermediate trial. Since 


2 smh) <1, 
we have 
j Pith) = P,vh) — max P,[(v ta m)h]. (1.4) 


Now by property (d) it follows that for every € > 0 and any preassigned i, j 
(i # j) there exists tọ such that 


max P,{t) < e, min P,(t) > 1-— e, min P;(t)>1—e 
O<t<to O<t<to O<t<to 


Hence if nh < tọ and v < n, it follows from (1.4) that 
jPith) > 1 — 2e. 


Using this estimate in (1.3) we obtain 


P.{nh) > (1 — 28) T Ph — €) > (1 — 3e)nP;;(h) 
v=0 


or 
P a > (1 — 38) P if nh < to. (1.5) 
Put 
= lim PO 
o t 


Then (1.5) shows that q;; < œ. In fact, if q;; = 00, we could find h arbitrarily 
small for which P;(h)/h is arbitrarily large; choosing n’ so that toọ/2 < n'h < to 
we would conclude on the basis of (1.5) that P;(n'h)/n'h would be arbitrarily 
large, but at the same time 


Pifrth) e 2 


E 
n'h nh to’ 


This contradiction implies that qj; < %. 
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The remainder of the proof is purely analytic and is a consequence of (1.5). 
From the definition of q;; there exists t; < tọ such that 


P;ft 
Pl) gyte 


ti 


Since P;{t) is continuous, we can find ho so small that t; + họ < to and 


af) < fij +e for tE I = [ti = ho, ti + ho]. (1.6) 


Now for any h < ho we determine an integer n, such that n,he I; thus, using 
(1.5) and (1.6) we find 


(a = 38) Pl Pith) — 


nh < ij + é, h < ho, 
ny, 


from which we conclude that 


(1 — 3e) lim “2 Pf ) 


h>0 


< qij + & 


Since ¢ is arbitrary, it follows that 


The theorem now follows from the definition of q;;. I 


As an example, in the case of birth and death processes, we have 


Ais j=itl, 
qi= Ai +H, Gy=719 fei- iti, i=0,1,.... 
Hi, j=i-1, 


We claim that generally 
Vaij<q foralli. 


j#i 
In fact, since 


2 P;fh) = 1 — Ph) 
j#i 
we have for any finite N 
N 
È Pith) <1 — Ph). 
j=l, j#i 
Dividing by h and then letting h > 0 leads to the inequality 
N 
yay S 4: 
j=, Jal 


Since N is arbitrary and all the terms are positive, the assertion follows. 
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2: Conservative Processes and the Forward and Backward Differential 
Equations 


A continuous time Markov chain is said to be “conservative” if 


Vaj=a< 0  foralli 

J#i 
Notice that a birth and death process is conservative. We now prove that for a 
conservative Markov chain not only are all the P;;(t) differentiable, if q; < 00 
(i > 0), but they satisfy a set of differential equations known as the backward 
Kolmogoroff equations. (For the special case of the birth and death process see 
Section 5 of Chapter 4.) We remind the reader, however, that the differentiability 
of the P;(t) follows directly from (a)-(d); it is only that the proof under the 
assumption of conservativeness is quite simple. Indeed, 


P;(s + t) — Pift) = by Pix(s)P, At) — P;{t) 
i k 
= Y Pals)Py At) + LPifs) — 1]P;((2). 
k#i 
Dividing by s and letting s > 0+, we obtain formally the backward equations. 
Pi f(t) = D diz Pit) — 4: Pit), for all i. (2.1) 
k#i 
To derive these equations rigorously, we must show that 
ae = > Pis)Pift) = » qir Pit). 
s>0+ 5 
Now 
1 N N 
lim — 2 P,(s)P,¢t) > lim — >; Pi(s)Pi f(t) = X dix Pit) 
soo Sk s>0 Sk=1,k#i k=1,k#i 
for any N > 0, and so 
lim ~ » Pix(s) P(t) 2 » qir Pit). (2.2) 
s>0 


On the other hand, for N > i, 


foo} 


> Pis)Pi ft) < y Pi(s)PyAt) + X Pixs) 
1 


k=1,k4i k=N+ 


N N 
= 2 Pils)Pi(t) + 1 — Pals) — > Pixs). 
ke tkel ke hiked 
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Dividing by s and taking lim,_,), of both sides, we obtain 
1 N N 
lim - X Pix(s)P, ft) < X dik Pi ft) + qi — È dix: 
s>0 Sk#i k=1,k#i k=1,k#i 
Letting N > œ and using the conservative nature of the system, we see that 
eo sd 
lim — > Pi(s)Pyft) < 2 dix P(t). 
s>0+ S k#1 k#i 
Comparing this inequality with (2.2) we conclude that 
. 1 
lim — }) Pa(s)P x(t) 
s>0 S k#i 
exists and equals 
È die P(t). 
k#i 
In a similar fashion we can formally derive a set of equations called the 


forward equations. We write 
Pi(s + t) — Pils) = X, Pils) Pat) — Pils) 
k 


= D PaP) — Ox; ]. 


Dividing by t and letting t > 0, we obtain formally 
Pi(s) = 2 Pis)quj — Pifs)q;, for all i, j, (2.3) 
k¥j 


the forward equations. A discussion of the scope and validity of these equations 
is rather more involved than in the case of the backward equations, and we 
shall not enter into this question. 

Both sets of equations assume a very simple form in matrix notation. Indeed, 
consider the infinite matrix A = ||a;; || defined by 


aij = i F J 
is i=]; 
called the infinitesimal matrix of the process. The backward equations may be 
compactly expressed by the matrix differential equation 


P(t) = AP(t) 
and the forward equations are of the form 

P(t) = POA, 
where 

P(t) = || POIL 
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3: Construction of a Continuous Time Markov Chain from Its 
Infinitesimal Parameters 


An interesting and important question in the theory of continuous parameter 
Markov chains is the following. Suppose that we are given a set of nonnegative 
numbers {q;;} with the property 
, Vaj<a for alli. 
j#i 
For uniformity of notation, we sometimes write q; = —q; as is done above. Is 


there a Markov chain, i.e., a standard transition matrix function || P;;(¢)||, for 
which 


P:(0) = qij, if], 
and 

Pi(0) = — qi? 
The problem becomes more specific if we assume that 


Yai = qi < 0c for all i, 

j#i 
for in that case we know that any Markov chain associated with the {q;;} must at 
least satisfy the backward equations. 

The practical importance of these questions rests on the fact (as we have seen 
specifically in the case of birth and death processes; see Chapter 4) that quite 
often a continuous time Markov chain is defined in a manner that enables one 
to derive the backward equations. One then attempts to solve these equations 
in order to calculate the complete transition probability function of the process. 

At the present time, definitive results in the general case are not available. 
It is known that under the condition 

a= 4, alli, 

j#i 
there exists at least one associated transition matrix function P(t), and if there 
cxists more than one, then there is an infinity of them. Of course more is known 
for special forms of A = || q;;|l, e.g., in the birth and death case. In that special 
case, a complete classification of all processes possessing a prescribed infinitesi- 
mal matrix is known. These processes differ mostly in boundary behavior, i.e., 
in the nature of the path functions at oo. We remind the reader in the special 
cxample of birth and death processes of the condition (4.5) of Chapter 4 that A 
must satisfy in order to assure a unique process. 

For the general, continuous time, Markov chain the problem of classifying 
the infinitesimal matrix A and its associated processes is quite complicated, 
und we can only refer the interested reader to the literature (see references at the 
close of this chapter), 
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Let i be such that 0 < q; < œ. We now present some interpretations of the 
elements of A analogous to the interpretation ascribed to the quantities A; + 4; 
and p; = 4;/(u; + 4,;) in the case of birth and death processes. Recall that we 
formally proved that (A; + u;)' is the expected waiting time in state i and 
A,/(u; + 4, is the probability of a transition to state i + 1 from i, given that a 
transition occurs. The results for the case of general, continuous time Markov 
chains are analogous. 

Let t > 0 be fixed and n > 0 be an arbitrary positive integer. Suppose that 
the process is started in state i. Then consider 


Pixo = ifort = 0S a= [ee] s 
nn n n 


1 — Pi(t) _ 
soe ead 


Since 
qi + o(l) (01) > 0Oast> 0+), 
we have 


LG) e aojeeds + fe) 


Using the expansion for the logarithm of the form log (1 — x) = —x + 0(x)x? 
valid for |x| <4 and |0| < 1 with x identified as — tq,/n + o(t/n) and then letting 


n > œ, we obtain 
‘ t\ |" 
lim [e] = exp( —q;t). 


But (see Chapter 1, page 33, A First Course) we may consider 


im Pr} XC) = = 05,2... 2 Ty rh 
non n 


as just 
Pr{X(t) =i, forall O<t< t}. 


(this entails the tacit assumption that the process is separable), which therefore 
shows that exp(—q,t) is the probability of remaining in state i for at least a 
length of time t. 

In other words, the waiting time distribution in state i is an exponential 
distribution with parameter q;. This confirms rigorously for general continuous 
time Markov chains what we have heuristically shown in the special case of birth 
and death processes (cf. A First Course, page 132). 

A state i for which 0 < q; < œ is called stable. It is absorbing if q; = 0, which 
obviously implies that once state i is entered the process remains there per- 
manently. Indeed, 


Pr{ X(t) = 1,0 <t < t|X(0) = i} = exp(—q,f) = | 
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for all t. On the other hand, if q; > 0, then the waiting time in state i is a random 
variable whose distribution function is that of a bona fide exponential and 
therefore some transition out of state i occurs in finite time. 

A state i for which q; = œ is called an instantaneous state. The expected 
waiting time in such a state is zero. The name “instantaneous” is appropriate 
since the sojourn time in state i is zero, i.e., when entering an immediate transition 
out of the state occurs. 

The theory of continuous time Markov chains with instantaneous states is 
exceptionally complicated, particularly with reference to the path behavior of 
the process. What is even worse is that Markov chains can be constructed with 
only instantaneous states. It is worthwhile to appreciate the technical problems 
inherent in such examples but at the same time it is comforting to know that 
almost all continuous time Markov chains arising in applications have only 
stable states. Actually, in most cases of interest the process under study is 
usually defined by specifying the infinitesimal parameters as the known data. 
To complete the theory, it is then necessary to establish the existence of a process 
(i.e., to determine the sample paths and their probability laws) possessing the 
prescribed infinitesimal matrix. 

Most elementary textbooks and discussions of continuous time Markov 
chains avoid this aspect of the problem (as we do) and concentrate primarily 
on the distribution theory of the process and the calculation of various prob- 
abilistic quantities of interest. The computation of the transition probability 
function for all ¢ is traditionally accomplished by deriving the backward differ- 
ential equations and, hopefully, solving them. This approach was the basis of 
our treatment of birth and death processes (see Chapter 4). 

Throughout what follows we restrict attention exclusively to continuous 
time Markov chains with only stable states. Our next task is to provide an 
intuitive meaning for the parameters q;;. In fact, if the process is conservative the 
elements q;;/q; (i # j) can be interpreted as the conditional probabilities that a 
transition out of state i occurs to state j. To see this we consider 


Rij(h) = Pr{X(h) = jIX(0) = i XH + i, j+i, 


and compute lim-o R;;(h). This is the probability of a transition from state i 
to state j, given that a transition occurs. Now by the very meaning of the symbols 
we have 


Pi fh) 
1 = Ph) 


Dividing numerator and denominator by h, letting h | 0, and using the results of 
Theorems 1.1 and 1.2 produces the desired formula: 


R;;(h) = 


lim Ruh) =, jÆi 
h}0 


The sum (with respect to /) equals | since the process is conservative. 
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We remarked above that for any infinitesimal parameters q;; > 0(i # j) and 
q: (0 < q; < ©) (i = 0) satisfying q; > Der qi; there may exist one or infinitely 
many continuous time Markov chains with the same infinitesimal matrix A. 
In the case of a conservative infinitesimal matrix (i.e., q; = >> qi for all i) there 
is one special process (the minimal process) for which the sample paths can be 
simply described. The construction of the minimal process in the case of birth 
and death processes was indicated in Section 4 of Chapter 4. The method is the 
same for the general case. We review briefly the essential ideas of this con- 
struction in the general context. A typical realization starting in any state, say 
i, is of the following form. We take an observation from the exponential distri- 
bution with parameter (q;). This determines the initial waiting time in state i. 
At the end of this wait the particle moves to state j with probability q;;/q:(j # i). 
In the new state just entered, say j’, it waits a random length of time (exponent- 
ially distributed with parameter qj); concluding its sojourn time in state j’, it 
then moves to a new state j” with probability q;-;/q; (j" # j'); it waits there a 
random time duration whose distribution law is an appropriate exponential 
and then moves again, etc. By this inductive procedure we build up all possible 
realizations of the process. By using rather deep measure theory methods one 
can determine the transition probability function || P; j(t) || having the prescribed 
infinitesimal matrix. 

Another way of describing the minimal process is as follows: The transition 
probability matrix || P(t) || is determined as the probabilities of the various 
transitions, given that only a finite number of steps have occurred. More 
specifically we introduce the quantity P;; (t; N), which represents the probability 
of a transition from i to j in time t, given that at most N transitions occur. In 
particular, consistent with the meaning of the infinitesimal parameters, we 
have 


exp( —q;t), i=j, 
Poi P(t) = 


and we can further develop a recursion formula connecting P;(t; N) and 
P;(t; N — 1) as follows. 

We consider first the term P; (t; N). Two possibilities arise according to 
whether a transition occurs before time t or not. The waiting time in state i is 
exponentially distributed with parameter q; so that with probability exp(—q;t) 
no transition occurs. Suppose that a transition first occurs in the time span t to 
t + dt [the probability of this event is q; exp(—q;t) dt] and that the state then 
changes to state j # i. (The probability of this last event is q;;/q;.) The probability 
of returning to state i in the remaining time span t — tin at most N — 1 tran- 
sitions is P; (t — t; N — 1). By the law of total probabilities we have 


t 
Pu (t; N) = exp(—q,t) + z uj Py(t — t; N — lq, exp(—qt) dt. 
J#i fi v0 
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By a similar enumeration of the alternative contingencies we derive the 
identity 


t 
PGN = Yaw | Put- uN- Nexr(-adde, i#j 


k#i 0 
It can be proved that 


lim P,,(t; N) = jO 
N> oo 
(see references at the close of this chapter). 

At this point we feel compelled to apologize to the reader for introducing a 
wealth of concepts of importance in the theory of Markov processes and doing 
very little with them. These concepts are fraught with many subtleties and 
pathologies well beyond the level of this textbook. We can only hope the student 
will pursue these problems further in the many excellent treatises of the subject. 


4: Strong Markov Property 


A rigorous discussion of the sojourn times for a Markov process, or even for 
continuous time Markov chains, is intimately related to the so-called strong 
Markov property. Although a complete discussion of this subject requires the 
use of the language of measure theory, the central idea can be explained in much 
simpler terms. 

To elaborate the relevant ideas we need to introduce a special type of random 
variable associated with a stochastic process, known variously as a random 
variable independent of the future, or a Markov time, or stopping time (we shall 
use “Markov time”). Let o be a nonnegative random variable associated with 
a given continuous parameter process {X,}, 0 < t< œ; in other words, 
associate with each sample function X, a nonnegative (and possibly infinite) 
number, which we denote by o(X,). The random variable a is said to bea Markov 
time relative to {X,} if it has the following property: 

If X, and Y, are two sample functions of the process such that X, = Y, for 
0 <t < sand o(X,) < s, then o(X,) = o(Y). 

In descriptive language, which is useful in understanding the essential 
properties of such random variables, we can say that the random variable o 
scans the sample function starting from t = 0 up until some time og which 
depends only upon the values of the sample function under consideration in 
the interval [0, co], and the time co at which o stops scanning is the value o(X,).t 

if 


t Strictly speaking, this description is slightly more restrictive than the precise definition given 
earlier, owing to the fact that we had o(X,) < s rather than o(X,) < s. This restriction can be avoided 
by saying that for cach ¢ > 0 the value of o(X,), if o stops scanning at oo. lies in the interval 
(fo = E On) 
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As an example of a Markov time, suppose that we are given a Markov chain 
that starts in state jo, i.e., X(0) = jo. If we set 


o(X,) = inf{t|X, = i}, 


where i is a fixed state, and the infimum is taken as + 00 if there is no t for which 
X, = i, then ø is a Markov time. The formal proof is simple. In fact, suppose for 
a fixed sample function X, that o(X,) < s. This means that there is t < s for 
which X, = i. Now if X, = Y, for0 < t < s, then Y, = iand certainly o(Y,) < 
o(X,). Hence by symmetry o(X,) = o(¥,). The random variable ø is called the 
first entrance time into the state i. Similarly, the first entrance time into any 
finite collection C of states not containing X(0) = jọ, defined by 


o(X,) = inf {t|X, € C}, 


is a Markov time, the proof being identical with that of the preceding example. 
Trivially ¢ = constant is a Markov time. The reader should have no difficulty 
in constructing other examples of Markov times. 

Intuitively speaking, the Markov property of stationary Markov processes 
asserts the following: If we know the values of X(s)for0 < 51 < s3 < +t < Sn = 
to (to > 0 fixed), the probability distribution of 


X(to + ti) X(to + tz), ..-, X(to + ta) O<ti <t <: <t) (4.1) 


depends only on the value of X(tọ). More exactly, the probability distribu- 
tion of (4.1) under the condition that we know the path X(s) at times 
0 < S4 < S2 <--+ < Sa = to, Or even more generally that we know the complete 
history of X(s) up to time tọ (0 < s < to), coincides with the probability distri- 
bution of 


X(t), X(t), ERa X (ty), given X(0). 


In other words, we can calculate the probability law of (4.1) by translating the 
time scale so that tọ = 0 and take the initial point as the value of X(to). 

It seems intuitively plausible that the same relationship should hold if we 
replace the fixed time value tọ by a “Markov time” ø. More explicitly, suppose 
we wish to compute the probability law of 


X(o + 7), given X(o) =x, (4.2) 


i.e., the probability law of X(t), t > 0 units after the occurrence of o, knowing 
the value of X at time ø. Now the random variable o involves the values of the 
path only up to and inclusive of the time o but not beyond øg, although the value 
of o is not necessarily fixed and may vary from path to path. In other words it is 
a random time. 

It seems reasonable that we may invoke the Markov property at the random 
time ø. It would then follow that the probability law of (4.2) coincides with the 
probability law of 


X(t), given xX(0) =x. (4,3) 
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This fact is not a direct consequence of the statement of the Markov property 
since the original formulation requires fixed times. The assertion that (4.2) and 
(4.3) are indeed governed by the same probability law is the sense of the strong 
Markov property. More precisely, if for any Markov time o, the probability 
distribution of 


X(t; +0) X(t, +0), ..., X(t, +0) (ty <ta < < ta), (44) 


given X(s), s < o, and X(o) = x, is identical with the probability distribution of 
X(t) X(t), .., X(t) (ti <e < te) 


given X(0) = x, then the Markov process is said to possess the strong Markov 
property. There are examples of Markov processes which are not strong Markov. 

Such a result as is implied by (4.4) is exceedingly important for purposes of 
calculating various probabilistic quantities of interest. In fact, one of the basic 
techniques in analyzing stochastic processes and in computing probabilistic 
quantities is to write suitable recursion relations, usually in terms of the first 
time or last time some specified event happens. As an example, suppose we 
attempt to compute P;;(t) by decomposing the implied event in terms of the 
event “the first time state j is entered.” 

Let o;; be the time at which state j is first entered starting from state i. We 
pointed out before that first entrance times to a finite set of states are Markov 
times and hence particularly o;; is a Markov time. Let 


F;,(s) E Pr{o;; < s}. 


Of course, at the instant o;; when the particle enters state j, the probability 
distribution of its future history is the same as if we translated time so that 
0;; = 0; the initial state is j and the Markov chain is governed by the transition 
probability matrix P,,(t) in the usual way thereafter. This principle is only 
correct provided the strong Markov property holds. Then the relation 


P,,(t) = [ Puc — s) dF;,(s) (4.5) 


is valid and represents a continuous time analog of formula (5.9) of Chapter 2. 
The reader may interpret dF;,(s) as the “density function” f;;(s) ds of o;; when 
a density exists. The formula (4.5) results by the law of total probabilities where 
dF',(s) is the probability that s < o;; < s + ds and P;,(t — s) is the transition 
probability that if the start is at instant o,, (the particle is then necessarily in 
state j), then t — s time units later the process is again at j. ; 

Numerous other intuitive renewal relations of the type (4.5) can be written 
out [the counterparts of (2.1)-(2.3) of Chapter 11]; we emphasize again that 
such relations are generally correct provided the strong Markov property 
prevails, 
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The fact that sojourn times of successive visits to a given state (or to two 
different prescribed states) enjoy the property that they are independent random 
variables is an immediate consequence of the strong Markov property. (The 
reader should supply a formal proof.) 

Because of the fundamental role of the strong Markov property in the 
analysis of Markov processes we close by stating the most welcome result: 
Any continuous time, conservative, Markov chain with only stable states is strong 
Markov. More generally, the strong Markov property prevails for any Markov 
time o for which X(o) 4 œ and a “nice” realization of the process is used. 

From a practical point of view this means that almost all intuitive renewal 
relations for these processes are fully rigorous and can be used without fear of 
error. The proof of the above stated result and a more thorough discussion of 
the strong Markov property for continuous time Markov chains and other 
foundational questions can be found in the book by Chung. (See Notes and 
References at the close of the chapter.) 


Problems 


1. Let P(t) = || P<) =o denote the transition probability matrix of a continuous time 
finite state Markov chain. Show that det[P(t)] > 0 for all t > 0. 


2. Let P be a2 x 2 stochastic matrix, i.e., P = [e A 


| where, 3200+ P= 1, 


y + 6 = 1. Prove there exists a continuous family of 2 x 2 stochastic matrices P(t), t > 0, 
such that P(1) = P if and only if 


det P = ad — yp > 0 and a+ô>l. 
Hint: Use the fact that, under the hypotheses, P can be expressed in the form e^ where A 


has the structure 


(a,b > 0). 


Rae eA 
b -b 


3. Show that if P(t) = || Pi;(t)||?;=0 satisfies postulates (a)-(d) of Section 1 then P(r) is 
continuous for all t > 0. 


Hint: Consult Section 8 of Chapter 4. 


4. Consider a continuous time irreducible Markov chain with a finite number of states: 
1, 2,..., N. Let qij, i, j= 1,..., N denote the infinitesimal parameters of the process. 
Assume that q;; = q; for all i j = 1, 2,..., N. Let P(t) be the probability that the process 
is in state i at time t for some prescribed initial state. Define 


N 
E(t) = — ¥ P(t) log P(t), 
i=l 


where x log x is taken as zero for x = 0), Prove that (1) is a nondecreasing function of f = 0, 
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Hint: First prove 
N 
P(t) = $, ayLP,(t) — PD]. 
jel 


Using this formula then prove 


N 


1 
=, È gL Pj(t) — Pi(t)]Log P,(t) — log P(t). 


=1 


aBC) 
dt 


5. Let X, (t > 0) be a conservative continuous time Markov chain, that is )}4:9:; = qi 
and assume that q;; > 0, i # j, and 0 < q; < œ. Show that X, is recurrent if there exists 
a sequence Z = (Zp, Zi» Z2, . . .) such that (i) z, > œ as n —> œ and (ii) —q;z; + 
yj#: 4ij2; < 0, i > 1. (Recurrence means that every state is visited infinitely often or 
equivalently each state is occupied a total infinite length of time.) 


Hint: Use the embedded Markov chain. This is the Markov chain {Y,},n = 0, 1, 2,..., 
with stationary transition probability matrix P = || P;; || where 

Pye i#j 

Pi; —, 0. 


The embedded Markov chain { Y,} simply records the sequence of states through which 
the process X, passes without regard to the length of time required for the transitions. 
Consult Theorem 4.2 of Chapter 3. 


6. Under the set-up of Problem 5 show that X, is nonrecurrent if there exists a bounded 
nonconstant sequence z = (Zo, Z1, Z2,...) Such that 


—Ziqi + È qiz; = 0, izl. 


j#i 
Hint: Cf. Section 4, of Chapter 3. 
7. For an irreducible aperiodic recurrent Markov chain, show that 


lim (P%)'" = 1. 


n> Q 


Hint: Use the result of subadditive functions developed in the course of the proof of 
Theorem 1.1. 


8. Consider an irreducible finite state (say N states) continuous time Markov chain. Let 
A be its infinitesimal matrix. Show that A has rank N — 1. 


9, Consider a two-state, continuous time, conservative Markov chain, and let 


Ay, di2 
Az 


Gz, 22 
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be the infinitesimal matrix of the process. Show that, in this case, the backward equations 
become E 


P!) — (trace A)Pi (t) + (det AP, =0, j= 1,2. 


i —i 1 
Solve these equations for A = P i 


10. Consider a symmetric random walk in r dimensions, i.e., in a single step the probability 
is 1/2r of visiting any of the neighboring states. Let x, be the number of self-avoiding walks in 
n steps, i.e., the number of n-step paths with the property that the same state is not occupied 
twice. Prove the simple inequality 


Xn+m < Xn Xm> 
and using this relation show that lim, 74" exists. 
Hint: Use the fact that Y, = log y, is subadditive, i.e., UW, 4m < Yn + Wm; See also page 139. 


11. Let P, = e^ be the transition probabilities of a finite state Markov chain having 
infinitesimal matrix A. Show that A;; = 0 whenever |i — j| > 1 if and only if for all sets of 
states i < k andj < l the determinant 


P;(t) Pat) 
Pt) Pat) 


is nonnegative. 


Hint: A matrix A is said to be totally positive of order 2, written TP,, if 


whenever i < k and j < l. Show by direct calculation that if A is an infinitesimal matrix 
with a;; = 0 when |i — j| > 1, then A is TP,. Then, for n sufficiently large, I + (t/n)A is 
TP, . Next, verify for2 x 2 matrices U and V that det(UV) = (det U)(det V). Thus, products 
of TP, matrices are TP, , whence (I + (t/n)A)" is TP, and finally P(t) = limp,» (I + (¢/n)A)" 
is TP,. 

For the converse, evaluate the following for j > i+ 1: 


Pi 41) P; ft) 


0 < det 
Pisi) Piss, jE) 
3 det Qi i+ 1t + o(t) aijt + o(t) 
1 — aisi i+1t + Olt) aigi, jt + ol) 
= —a;jt + O(t’). 


This implies a;; = 0 if j > i + 1. A similar argument holds for j < i — 1. 


12. Let {X(1):1 = 0} be a finite state Markov chain with generator Q = ||q;,\|. Set 
qi = —qu > 0. Define oy = 0 and let o, be the time of the nth change of state of {X()}. 
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Formally o, = inf{t > o,_,: X(t) # X(o,_,+)}. Let é, = X(e,+) for n = 0, 1,.... Then 
(cf. Chapter 4) {€,} is a discrete time Markov chain whose transition matrix P = ||P; |l 
is given by 


Pi; =i qij/qi» i#j 
= 0, i=j. 


Suppose {X(t)} has a single recurrent class with stationary distribution a = || œ; || and 
that {¢,} has stationary measure x = || 7; ||. Show that «;q; is proportional to 7;. 


13. (continuation). Suppose that {X(t)} is a continuous time Markov chain whose state 
space is {0, +1, +2,...} and let {€,} be defined as in Problem 12. Show that {X(t)} and 
{čna} have the same recurrent classes, but they need not be positive recurrent together. 
(Suppose q; 541 = qi i-1 = 24;- Then {čp} is null recurrent, but {X(t)} is positive recurrent 
when );; 1/q; < ©.) 


14. Let {X(t)} be a continuous time Markov chain on {0, 1,..., N} with absorption at 0 
and N. Let A be the generator over the transient states. Let m;; be the mean duration spent 
in j starting from i. Show M = —A~! where M = ||m;; ||. 


15. (continuation). Let T; be the total sojourn time in state j and define m,; = EC T;| X(0) = i] 
for i,j = 1,..., N — 1. Show that 


Pr{T; = 0|X(0) = i} = 1 — my/my; 


JJ 


and 
Pr{0 < T; < t|X(0) = i} = (m,,/m,)[1 — exp(t/m;)], t>0. 


16. (continuation). Let ¢ be the time to absorption in state 0 or N. Formally ¢ = )"={ Tj. 
Show that, conditioned on X(0)=i, € has probability density function f(t) = 


—e;Q exp(Qt)e;, where e; = |[6,;|| is the vector with a 1 in the ith position and 0 otherwise. 


17. Consider an irreducible continuous time Markov chain on the states {0,..., M}. Let 
(%),..., %,) be the stationary distribution. Define 


pi = inf{t > 0: X(t) ¥ i}, 
E, = inf{t > p;: X(t) = i}. 


Then p; is called the i-residence and E; the i-excursion. 
Show that 


PAD = em + | Pat — 9 dH), 
0 


where Hs) = Pr{E, < s} and Q = |/q,;l| is the infinitesimal matrix, with q; = = er i 


I8. Let {X(1)} be a continuous time Markov chain with state space S = {0, 1, ...} and 
generator Q. Let y be a nonzero solution of Qy = Ay. Show that Y(t) = e7*y(X(0) is a 
martingale with respect to {X(t)}. 
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NOTES 


This chapter incorporates a bare few topics with expanded detail from Chung [1]. See also 
references therein. 

The bulk of the ideas of this chapter developed in the context of general Markov pro- 
cesses can be found in Dynkin [2] and Doob [3]. 

Kingman’s interesting book [4] incorporates valuable results concerning recurrence 
phenomena and the transition probabilities of general continuous time Markov chains. 
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Chapter 15 


DIFFUSION PROCESSES 


1: General Description 


A continuous time parameter stochastic process which possesses the (strong) 
Markov property and for which the sample paths X(t) are (almost always)* 
continuous functions of t is called a diffusion process. A number of alternative 
characterizations of diffusion processes are set forth later in this section. 

Apart from their intrinsic interest, diffusion processes are of value for a 
manifold of purposes. 


1. Many physical, biological, economic, and social phenomena are either 
well approximated or reasonably modeled by diffusion processes. These 
include examples from molecular motions of enumerable particles subject to 
interactions, security price fluctuations in a perfect market, some communica- 
tion systems with noise, neurophysiological activity with disturbances, variations 
of population growth, changes in species numbers subject to competition and 
other community relationships, gene substitutions in evolutionary development, 
cle. 

2. Many functionals, including first-passage probabilities, mean absorp- 
lion times, occupation time distributions, boundary behavior properties, and 
stationary distributions, can be calculated explicitly for one-dimensional dif- 
fusions. The calculations largely reduce to solving second-order differential 


{ The phase “almost always” means with probability 1. For our purposes, without loss in 
generality we may and do assume that all sample paths are continuous. 
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equations with simple boundary conditions (see Sections 3 and 4). In principle, 
these formulations and calculations can be extended to multidimensional dif- 
fusions but then involve partial differential equations whose explicit resolution 
is usually formidable. 

3. By transforming the time scale and renormalizing the state variable, 
many Markov processes can be approximated by diffusion processes. The 
concept of rescaling in a quite general model is described in intuitive form at 
the close of this section, and a variety of examples of these approximations will 
be set forth in Sections 2 and 10. 


Consider a diffusion process {X(t), t > 0} whose state space is an interval J 
with endpoints | < r, so that I is necessarily of the form (l, r), (l, r], [l r), or 
[L r] (a parenthesis signifies that the interval is open at that end, while a square 
bracket means it is closed). We allow the possibilities 1 = —0oo and/or 
r =+00. Such a process is said to be regular if starting from any point in the 
interior of J any other point in the interior of I may be reached with positive 
probability. This may be expressed more precisely through the concept of 
hitting time random variables. For any point z in the interval I, let T, denote the. 
random variable equal to the first time the process attains the value z. In the 
event that z is never reached, by convention we set T, = œ. The process is 
regular if 

Pr{T, < œ|X(0) = x} > 0, 


whenever | < x, z < r. Henceforth, without further mention, we shall consider 
only regular diffusion processes. 

Most Markov processes, including the Poisson process, birth and death 
processes, etc., satisfy the property 


lim Pr{|X(h) — x| > e| X(0) = x} = A(x, £), 
h\o h 


with A(x, €) nonnegative and possibly positive for e small. In fact, recall that for 
the Poisson process we have 


lim Pr{X(h) —i = 1|XQ =H =A i=0,1,..., (*) 
alo h 


where 4 is the mean rate of the occurrence of events. The sample realizations 
of the Poisson process are discontinuous step functions having jumps of unit 
increase. 

In contrast to (*) every diffusion process satisfies the following condition: 
For every e > 0, 


lim = PrilXt + h) — x| > e|X(t) = x} =0 for all x in I. (1.1) 
h0 . 


This relation asserts that large displacements, of order exceeding a fixed £, are 
very unlikely over sufficiently small time intervals. This fact can be viewed as 
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a formalization of the property that the sample paths of the process are con- 
tinuous. Theorem 1.1 asserts a partial converse: A Markov process for 
which (1.1) holds in an appropriate uniform sense is a diffusion process. The 
theorem is followed by Lemma 1.1 which establishes the necessary condition 
(1.1) as a consequence of moment properties that are usually satisfied in 
practice. 

Almost all diffusion processes that have appeared to date as models of 
physical or biological phenomena are characterized by two basic conditions 
which augment (1.1) and describe the mean and variance of the infinitesimal 
displacements. Let A, X(t) be the increment in the process accrued over a time 
interval of length h. Thus, A,X(t) = X(t + h) — X(t). These key conditions 
affirm the existence of the limits 


tim | ETA, X()|X() = x] = wat) (1.2) 
hilo h 
and 
lim + E[{A, X()}2|X() = x] = 07,2) (1.3) 
nto h 


whenever | <x <r. The functions u(x,t) and o7(x,t) are termed the in- 
finitesimal parameters of the process, and, in particular, u(x, t) is called the drift 
parameter, infinitesimal mean, or expected infinitesimal displacement and o7(x, t) 
the diffusion parameter, or infinitesimal variance. 

Generally, u(x,t) and o7(x, t) are continuous functions of x and t, and a 
regular process typically has o?(x, t) positive for all | < x < r and t > 0. (But 
see Problem 26.) Indeed, some texts (especially those with emphasis on 
practical models) define {X(t), t > 0} to be a diffusion if X(t) is a Markov 
process satisfying (1.1) and if (1.2) and (1.3) give continuous functions of x and 
t. 

It is tacitly assumed in relations (1.2) and (1.3) that the displayed moments 
exist. A more general version presents these relations in terms of truncated 
moments where A,,X(t) is replaced by 


_ JAX if AXO <s, 
Mee XO) = e otherwise. 
Formally, the conditions (1.2) and (1.3) are replaced by 
lim = EA XOXO = x] = ux, 2) (1.2) 
h|o 
lim 7 EL(A,.X(0}71X(0) = x] = 0% 0, (139 
hlo 


respectively, and of course (1.1) must be adjoined to (1.2’) and (1.3’) to be 
assured that the process is a diffusion. 
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The use of A, X(t) in place of A,X(t) adds only technical complexity and 
essentially no new ideas. Moreover, in most (if not all) practical examples of 
diffusions, the indicated moments exist. Accordingly, we stipulate henceforth 
that the relations (1.2) and (1.3) hold. Moreover we concentrate primarily on 
the time homogeneous case where the functions (x,t) = u(x) and 
o?(x, t) = o7(x) are independent of t. 

The motivation for the name “infinitesimal mean” for u(x) is clear, since 

E[A,X(6| X(t) = x] = wxh + 0,(h) 


where 0,(h) accumulates the remainder terms of order less than h as h > 0. For 
the variance we have 


var[A,X(t)| X(t) = x] = E[(A,X(0)?|X() = x] — {E[TA,X@| X0) = x)? 
= o°(x)h + olh) — [u(x)h + 04(h)]? 
= o°(x)h + 03(h) 
where 
o(h) = 02(h) — [w(x)h + 0,(h)]? 
is again a remainder term of order smaller than h. 
In addition to the infinitesimal relations (1.2) and (1.3), the following 
higher-order infinitesimal moment relations are usually satisfied: 
im ELA eXOM XO =x] _ 
hlo h 


0, r=3,4... (1.4) 


Sometimes (and frequently in the physical literature) a diffusion process is 
defined to be a Markov process obeying the infinitesimal moment conditions 
(1.2) and (1.3), and (1.4) is stipulated to hold only for some r,r > 2. In fact, 
subject to smoothness assumptions on u(x) and o”(x), a process satisfying (1.2), 
(1.3), and (1.4) for some r > 2 can be realized having only continuous paths. To 
formalize this approach is arduous and unnatural in some ways, and therefore 
the modern setup is to define a diffusion as we have done, as a strong Markov 
process with continuous sample paths. 


MULTIVARIATE DIFFUSION PROCESSES 


Although we concentrate on one-dimensional diffusions, it is often relevant to 
consider a vector diffusion process, e.g., a diffusion on the state space I = E" 
(Euclidean n-space) or an open region of E”. Let X(t) = (X(t), X2(t),..., XOD) 
be a vector process. The analog of the infinitesimal relations (1.2) and (1.3) are 


1 
re alae + h) T X (t)| X(t) =x = (Xis Xo en Xn) = u(x, t), 
i= 12... (1.2”) 
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and 


od 
im z EHX (t + h) — X{)}{ Xt + h) — XIXA = x] = ox, 0), (1.3) 
j= 1,...,n. 
The matrix ||o,{x, t)||į is required to be positive definite, i.e., 


G(x, tjaja; > 0 for all nontrivial real n-tuples (a,, a), ...,4a,) and 
1 


xel 


TMs 


consistent with the positive value of 


n n 2 
È aa X(t +h) — X(HIX {te + h) — X = b a, [X(t + h) - xon} . 
i,j=1 v=1 

The higher-order moments are again taken to be negligible compared to h, 
e.g., 


1 , 
lim > EHX + h) — XIXA =x] =0, for i= 1,...,0. 
hlo 


CONSERVATIVE PROCESSES AND DIFFUSIONS WITH KILLING 


A process is called a diffusion with killing if the sample paths behave like those 
of a regular diffusion until a possibly random, possibly infinite time € when 
the process is killed. There are many natural models, especially in biological 
and physical contexts, for which this generalization is appropriate; see Section 
2, Example G and Section 10. 

We write a diffusion with killing in the form {X(t),0 < t < ¢}. Since we 
allow ¢ = œ, this includes the case where no killing occurs. If € = œ from all 
starting points X(0) = x, the process is said to be conservative, and we write 
{X(t), t > 0}. That is, a regular diffusion {X(t} on I is conservative if 


Pr{X(t) e I|X(0) = x} = Pr{f > t|X(0) = x} = 1 


for allt > Oand x in I. 

Many authors append a distinguished point A (called “the cemetery”) to 
the interval I, thereby enlarging the state space to I U {A}, with the pre- 
scription that X(t) = A for t > ¢. This construction serves as a technical 
convenience, ensuring that the process is always conservative on the augmen- 
ted state space I U {A}. However, the convention needlessly encumbers the 
notation in applications, where it is preferred simply to allow X(t) to be 
undefined for t > ¢. 

For a diffusion with killing, at each point x there is a probability 
k(x) dt + o(dt) that the process ceases (is killed) over the infinitesimal time 
duration (t, t + dt), and probability 1 — k(x) dt + o(dt) that no killing occurs. 
The above prescription is merely descriptive. Its rigorous formulation requires 
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the concept of stochastic multiplicative functionals. We shall say more on this 
in Section 12. Notice that the killing rate can depend on the position and even 
on time, if we allow k(x, t) to be a function of time t as well as position x. In 
symbols, 


ini Pee <€<t+h|X(t) = x} = k(x, t). 
nyo h 


There is a canonical construction of the process governed by the in- 
finitesimal rates (1.2) and (1.3) with infinitesimal killing rate k(x, t) that is 
expressed in terms of a conservative diffusion process having the same mean 
and diffusion coefficients. 


HITTING TIMES 


Hitting times of points and sets play a fundamental role in the study of one- 
dimensional diffusion processes. Formally we define the hitting time of the 
process {X(t),0 < t < ¢} to the level z by 
T, = © if X(t)#4z for O<t<C 
= inf{t > 0; X(t) = 2} otherwise. 
For typographical convenience we write T(z) interchangeably for T,, the 
meaning being clear from the context. 
We use the notation 


T* = Ta» = T(a,b) = min{T(a), T(b)} = T(a) a T(b) 
for the hitting time to a or b, the first time X(t) = a or X(t) = b. For processes 


starting at X(0) = x in (a,b), this is the same as the exit time of the interval 
(a, b): 


T(a, b) = inf{t > 0; X(t)¢ (a, b), X(0)=x in(a,b). 


WHEN IS A STOCHASTIC PROCESS RECOGNIZED AS A DIFFUSION ?+t 


It is of value to have verifiable sufficient conditions under which a Markov 
process is (or can be realized as) a diffusion process. To this end, we introduce 
the concept of a standard{ process. A strong Markov process {X(t),t > 0} is 
called a standard process if the sample paths possess the following regularity 
properties: 


(i) X(t) is right continuous; i.e., for all s > 0 
lim X(t) = X(s); 
tls 


t The remainder of this section can be skipped at first reading. 
t The definition of “standard” given here is for a process without killing. 
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(ii) left limits of X(t) exist; i.e., 

lim X(t) exists forall s > 0; 
tts 
and 

(iii) X(t) is continuous from the left through Markov times; i.e., if 
T, < T, <--- are Markov times (see Sections 3, 7, and 8 of Chapter 6) 
converging to T< œ, then lim,.,,, X(T,) = X(T) whenever T < œ. (To be 
precise, equality holds in the almost sure sense in which Pr{T < œ and 
lim,- X(T,) # X(T) = 0.) 

The third condition is sometimes referred to as quasi-left continuity. It does 
not imply that the sample paths are left continuous, but only that jumps 
cannot be predicted. As an example, the Poisson process is a standard process 
that exhibits jump discontinuities. The random times T, = T — 1/n, where T is 
the time of the first jump, are not Markov times. 


In particular, a sample path of a standard process can at worst exhibit 
discontinuities of the first kind (jumps). 

A deep and remarkable result is that every strong Markov process 
{X(t), t = 0} continuous in probability (see below) and subject to mild regularity 
conditions, possesses an equivalent version {X(t),t > 0} which is a standard 
process. (The processes X(t) and X(t) are equivalent provided that X(t) and X(t) 
have the same finite-dimensional distributions.) Recall that a stochastic pro- 
cess is continuous in probability if for any € > 0 and s > 0 


lim Pr{|X(t) — X(s)| > e} = 0. (1.5) 


Property (1.5) is quite weak and is satisfied in most stochastic models, and thus 
for all practical purposes we can assume that the Markov process at hand is a 
standard process. 

A sufficient condition that a standard process be a diffusion is the 
fulfillment of the Dynkin condition: 


+ PriIX(¢ + h) — X(t)| > e|X() = x} > 0 when ¢é> 0, (1.6) 


us h|0, where the convergence prevails uniformly for x restricted to any 
compact subinterval of (l, r) and t traversing any finite interval [0, N]. (We 
emphasize that uniformity is crucial.) The following theorem asserts the suffici- 
ency of this condition. 


Theorem 1.1. Let {X(t),t > 0} be a standard process (see (i)Hiii) above) and 
suppose the Dynkin condition holds. Then {X(t), t > 0} is a diffusion process. 


Proof. We give the proof assuming that the state space I is a closed bounded 
interval so that the convergence in the Dynkin condition (1.6) holds uniformly 
for all x. 
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Take an arbitrary number 6 > 0, a fixed integer N > 0, and a sequence of 
integers k = 1, 2,.... Consider the events 


Ay(k, i) = E) — x(t =>") > sh i=1,..k. 


That is, Ay(k, i) consists of all sample paths that move more than 46 between 
times (i — 1)N/k and iN/k. Consider the event 


B; = 1.2, IX) — X(t-)| > ah 


consisting of all realizations exhibiting a jump exceeding 6 in the interval 
0 <t < N. Since the sample paths all have left limits and are right continuous, 
we claim that 


o œo k 
B; c U Q U Ayk, ìi). (1.7) 
jž1k=již1 
Indeed, any sample path in B; must have a point t (0 < t< N), where 
|X(t) — X(t—)|> ô. For all k sufficiently large and appropriate i satisfying 
(i — 1)N/k < t < iN/k, we can guarantee the validity of (1.7) by relying on the 
property that X(t) is right continuous and has a limit from the left at t. Now, 
from (1.7) we deduce 


Pr{B;} < Pf a Ú) Ankk, of 


jul k=již1 


(oo) k 
= lim Pr Ay(k, of 
jroo k=ji=1 
k 
< lim Pf) Anlk, of 
k> a i=1 
k 
< lim Pr{Ay(k, i}. (1.8) 
k>% i=1 


But with h = N/k, the uniformity in the Dynkin condition (1.6) implies 
Pr{Ay(k, i)} < e(ô, k)(N/k), 


where e(ô, k) > 0 as k > œ. Adding this over i, we obtain 
k 
X Pr{An(k, i} < Ne(ô, k). (1.9) 
i=1 
Sending k —> œ, and comparing (1.8) and (1.9), we deduce that 


Pr{B.} < N lim e(ô, k) = 0. (1.10) 
k~ œw 
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This holds for every 6 > 0. Consequently 


Pr sup |X(t) — X(t—)| > o} < 5 Pr sup |X(t) — X(t—)| > ah = 0. 
O0<t<N é>0 0<t<N 
rational 


Thus, with probability 1, each path is left continuous, and therefore con- 
tinuous, for 0 < t < N. But N is arbitrary and a standard process is right 
continuous, so X(t) is continuous at all t > 0. The proof is complete. I 


The following lemma is valuable and will be cited on several occasions: 
Lemma 1.1. If a standard process satisfies the infinitesimal moment condition 
1 
lim z EAX OPIXO =x]=0 (1.11) 
h0 


for some p > 2uniformly for x in any compact subinterval of (l, r), and t in any finite 
interval [0, N], then the Dynkin condition (1.6) is satisfied. 


Proof. A direct application of the Chebyshev inequality gives 


SEUNA = x] 


1 
z Prild.X(Ol > el X() = x} he? 


(1.12) 
and the desired result is clear. E 


Remark. The inequality in (1.12) applies for any p>O but its utility for 
diffusions ordinarily requires p > 2, with p= 4 often being the most con- 
venient choice. 

Applications of Theorem 1.1 and Lemma 1.1 will be made in the later 
sections. 

Another criterion frequently used to check that a one-dimensional stochas- 
tic process X(t) (not necessarily possessing the Markov property) -has a 
continuous path realization is the condition of Kolmogorov now stated. 

Let {X(t), t > 0} be a stochastic process obeying the bound 


E(|X(t) — X(s)’] < Cle) — g(s)! +” forall s,t > 0, (1.13) 


where «, y, and C are positive constants independent of s and t and ọ is a 
continuous nondecreasing function. Then there exists an equivalent version 
X(t) possessing continuous paths. 

For a d-dimensional process the Kolmogorov condition must be modified 
to 


E[|X(t) — X(9)’] < Clet) — pe)" 


with æ, y, and C positive as before and the exponent on the right now 
exceeding d. 
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CHARACTERIZATIONS OF DIFFUSION PROCESSES 


It is useful to conclude this section by briéfly and informally reviewing several 
alternative characterizations of diffusion processes. 


(a) Definition of a Diffusion in Terms of the Infinitesimal Coefficients 


We have already remarked with a view to applications that the infinitesimal 
parameters u(x,t) and o7(x,t), assumed to be smooth with o7(x,t) > 0 for 
Il<x <r, summarize the basic information at hand. When appropriate 
higher-order moment restrictions as in (1.4) are in force, then apart from 
boundary behavior the probability laws governing the realizations of the 
diffusion are determined from the nature of the infinitesimal mean and 
variance coefficients. 

The classification of all possible diffusions with prescribed infinitesimal 
coefficients depends on a delineation of the motion on or off the boundary. 
Some of the problems in these constructions will be exemplified in our 
discussions of the boundary behavior for the diffusions covered in Sections 6-8. 
Our primary approach in dealing with diffusions will be to operate in terms 
of the infinitesimal coefficients and the moment conditions (1.2)-(1.4). 


(b) The Stroock—Varadhan Martingale Characterization of Diffusions 


Let X(t) be a time homogeneous diffusion with drift u(x) and variance o?(x) for 
l<x<r. 
Consider for each real A, the stochastic process 


Y(t) = exp| Ax t) — a fax (s)) ds — 4A? [or (s)) as| (1.14) 
0 0 


defined for t >0. A very useful result affirms under proper regularity and 
boundary constraints that {Y,(t)} constitutes a martingale with respect to the 
family of o-fields F, determined by the process X(t) up to time t. When X(t) is 
standard Brownian motion so that u(x) = 0 and o?(x) = 1, then plainly 


Y,(t) z= eP 0O — 47t/2) 


which is a familiar martingale process encountered in Chapter 7. Subject to 
suitable technical requirements a converse theorem holds as follows: If Y,(t) is 
a continuous time martingale process for each real A, then {X(t)} is a diffusion. 

A related martingale characterization, quite useful in some contexts, is the 
following. Subject to suitable technical requirements, { X(t), t > 0} is a diffusion 
process with infinitesimal coefficients u(x) and o7(x) if and only if for every 
bounded and twice continuously differentiable function f(x) the process 


Z(t) = fX) — f(XO) — \ [307(X(s)) f"(X(s)) + UXX] ds 
0 


is a martingale. 
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When X(t) is standard Brownian motion and f(x) = x, we recover the well- 
known fact that Z,(t) = X(t) is a martingale. When f(x) = x? then Z,(t) = 
X(t)? — t is also a martingale. (See Section 5 of Chapter 7.) 

The above martingale characterizations are dealt with further in Sections 
11, 12, and 16. 


(c) Stochastic Differential Equations 


A useful tool in modeling physical, biological, and economic phenomena is the 
concept and solution of stochastic differential equations. The procedure is to 
extend the method of successive approximations used to solve ordinary 
differential equations to the stochastic context. The solutions obtained are 
diffusions. An introductory account of these ideas is set forth in Section 14 and 
further elaborated in Sections 15 and 16. 


(d) Total Positivity Characterization of One-Dimensional Diffusions 


Consider a diffusion process with state space a segment (l, r) of the real line. 
For ease of exposition assume the process is governed by a transition density 


P(t, x, y)dy ~ Pr{y < X(t) < y + dy|X(O) = x}. 
Choose 2r points inside the state space satisfying 
Xy< Xa << Xps Yi < Y2 < < Y, (1.15) 
and form the determinant 


p(t, X1, yı) P(t, X1, Y2) 1 p(t, X1, yy) 


t, ’ t, X2, i ry sIr. 
detlpl, x, y,)|| = det P629) PEx) e pexa aao 


P(t, Xps yı) p(t, Xr Y2) ae p(t, Xps yy) 


It is a remarkable fact that when X(t) is a one-dimensional diffusion, then the 
quantities of (1.16) are always positive. 

The converse, under mild regularity requirements imposed on p(t, x, y), is 
also valid. Thus, if X(t) is a strong Markov process and (1.16) is positive for 
each t and all choices of (1.15) (r = 2 suffices), then X(t) is equivalent to a 
diffusion. 

The assumption of the existence of a density is not necessary in that the 
determinantal inequalities can be reexpressed in terms of the transition 
probability distribution. 

The determinantal expression (1.16) has a probabilistic interpretation 
which we elaborate in Problem 21. 
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SOME COMMENTS ON APPLIED STOCHASTIC MODELING AND RELATED DIFFUSIONS 


Many stochastic models evolving in discrete time have the structure 


Zar = f (Zr Sh) + Sas k = 0,1,2, ..., (1.17) 


or more generally 
Zkari = Lx Sk Či). (1.18) 


(A broad spectrum of examples of (1.17) and (1.18) are dispersed throughout 
the chapter.) Thus, the process variable Z+; at generation k + 1 is influenced 
by two types of forces. The first contribution depends on the current state 
variable Z, and other systematic or randomly varying parameters {s,} whose 
effects may be temporally correlated. The {s,} factors are coupled to Z, 
through the function f. The second contribution embodied in €, is generally 
regarded as a “noise” perturbation. In ecological, genetic, and other biological 
contexts, the process {s,} is sometimes referred to as stochastic or deterministic 
environmental effects while {č} is regarded as sampling or demographic effects. 

The analysis of {Z,} following (1.17) is often facilitated by passing to an 
analogous diffusion (or stochastic differential equation) model which is in 
many instances more tractable, while retaining the relevant qualitative 
consequences. 

In applied stochastic modeling, there arise families of discrete or con- 
tinuous time Markov chains Z(t) indexed by a natural parameter N. 
Depending on the problem and objectives, it is frequently natural to rescale 
the time and state variables, producing the transformed process 


XA) = a(N)[Z™(b(N)t) — c(N)] (1.19) 


where c(N) is a centering constant, a(N) performs a scaling of the state 
variable, and b(N) performs the required time scaling. Accordingly, a unit of 
time in the X“ process corresponds to an epoch of about b(N) time units in 
the original Z™ process. 

By judicious rescalings, we often uncover the convergence phenomenon 


XML) > X(t) (1.20) 
(convergence in the sense defined in (1.24) and (1.25)) where X(t) is a diffusion. 
In this way properties of certain functionals of the process X(t) can be 
obtained from the calculation of the corresponding functionals for the X(t) 


process. We amply illustrate the concept and mechanism of (1.19) and (1.20) in 
terms of concrete examples in Sections 2 and 10. 


CONVERGENCE TO DIFFUSIONS 


Consider the sequence of discrete time stochastic processes X™ = {XM} with 
state space confined to a closed bounded interval of the real line adapted} to 


+ That is, X{M is measurable with respect to F{" for every k 2 0. 
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an increasing sequence {F M, k > 0} of a fields. It is usual to take F to be 
the o field determined by the process values {X‘$),i < k}; cf. Chapter 6. The 
processes X“) need not possess the Markov property. 

The conditional moments of AX?’ = Xx), — X) are assumed to satisfy 


EAX K| Fr] = hy XS) + ens 
E[(AX*)? FN] = hyo?(X) + e p, 


(1.21) 


and 
ERLAXA R] = 83n (1.22) 


where hy > 0 and hy > 0 as N > œ and the error terms are uniformly small to 
the extent that for any positive t . 


> Elle. +-0 for i=1,2,3 as Nooo. (1.23) 


n<[t/hn] 


When the processes {X} are Markov chains, then the conditioning in (1.21) 
and (1.22) refer to the state of Xo). 
Define the continuous time processes 
X(t) = Xano 
where [t/hy] is the largest integer not exceeding t/hy. 
Under sufficient smoothness requirements we have the convergence 
X(t) converges in distribution to X(t) for each t (1.24) 


and also for any set of time points O < t; < t, < =- < t, the finite dimensional 
distributions of 


{X(t,), XML), ..., XM(t)} converge to those of 
{X(t1), X(t2), -- X(t}. as N > 00 (1.25) 


where X(t) is a diffusion process whose infinitesimal drift and variance 
coefficients are the functions p(x) and (x) of (1.21). 

The resemblance of (1.21) and (1.22) to the infinitesimal moment relations 
(1.2)-(1.4) is manifest. The criteria (1.21) and (1.22) tell us that if a sequence of 
processes closely obeys the moment conditions then there exists a diffusion 
that serves to approximate the processes {X} with N large. The proofs of 
(1.24) and (1.25) are technical and will not be presented. 


2: Examples of Diffusion 
A. BROWNIAN MOTION 


Brownian motion is a regular diffusion process on the interval (— 00, +00) 
with a(x) = 0 and (x) = o?, a constant, for all x. We may compute these 


170 15. DIFFUSION PROCESSES 
infinitesimal parameters from the knowledge that A,X = X(h) — X(0) is nor- 
mally distributed with mean zero and variance 07h, whence 

E[A,X|X(0) = x] =0 and E[(A,X)?| X(0) = x] = 07h. 


We will now verify that the Dynkin condition is satisfied in the stronger form 
such that (1.11) applies with p = 4. Precisely, we have 


vi 1 + co 4 1 y? y? 
E((A,X) | X(0) = x] = Jonho y exp -3 ay dy Set- a7, = 2¢ 


7 i - [ereta = 3h?o* 


TT 0 


and (1.11) is confirmed. 

Adding the trend ut to a Brownian motion X(t) produces a Brownian 
motion with drift X(t) + ut. In this case, the drift parameter is u, while the 
variance parameter remains o?. 


B. ABSORBED AND REFLECTED BROWNIAN MOTION 


Absorbed and reflected Brownian motion are regular diffusion processes 
defined on the common state space I = [0, 00). Starting from a point X(0) = x 
in the interior of the interval, that is, x > 0, both processes act like Brownian 
motion until the level zero is first reached. Therefore, the infinitesimal parameters 
are u(x) = 0 and o7(x) = o? for 0 < x < œ. 

Here we have two diffusion processes with the same state space and the 
same infinitesimal parameters, which shows that these parameters do not 
uniquely define a diffusion process. The infinitesimal parameters govern the 
process evolution only while the process is in the interior of the state space I. 
To fully define a diffusion process, in addition one must adjoin boundary 
conditions to specify the behavior at any endpoints of I that the process may 
reach. Often physical considerations dictate the choice of the boundary 
conditions in a natural way. A more detailed discussion of absorbed and 
reflected Brownian motion was covered in Chapter 7 on Brownian motion (see 
Section 4). 


C. ORNSTEIN-UHLENBECK PROCESS 


This diffusion process has the entire real line I = (— œ, œ) as its state space 
and u(x) = — ax, o?(x) = o°, as its infinitesimal parameters, where « and c are 
arbitrary positive constants. The infinitesimal drift parameter reflects a restor- 
ing force directed towards the origin and of a magnitude proportional to the 
distance. 

If Brownian motion represents the position of a particle, the derivative of 
Brownian motion should represent the particle’s velocity. But, as mentioned in 
Chapter 7, this derivative does not exist at any point in time. The Ornstein- 
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Uhlenbeck process is an alternative model which overcomes this defect by 
directly modeling the velocity of the particle as a function of time. Two factors 
are considered to affect this velocity during a small time duration. First, the 
frictional resistance of the surrounding medium is assumed to reduce the 
magnitude of the velocity by a proportional amount. Second, there is a change 
in velocity caused by the random collisions with neighboring particles. These 
two factors lead to the specifications u(x) = — ax and o?(x) = 07, respectively. 

In Chapter 7, Section 1, we sketched how Brownian motion was approxi- 
mated by means of a discrete random walk. The Ornstein—Uhlenbeck process 
may be approximated by the Ehrenfest urn model depicting diffusion of 
particles through a permeable membrane. (See Example B of Section 2, 
Chapter 2.) In this model, if there are i particles in urn A, the probability that 
there will be i+ 1 after one time unit is 1 — (i/2N) and the probability of i — 1 
is i/2N, where 2N is the aggregate number of particles in both urns. We might 
expect an appropriate limiting process, in which the time between transitions 
becomes small and the number of particles becomes large, to be a diffusion 
process, where the changes over an interval of time would vary continuously. 
Let At be the time between transitions and let X y(t) be the number of particles 
in urn A at time t. Then, with Xy(t) = x and AX = Xj(t + At) — X(t), the 
probability law is 


N-x 

2N ` t 
We let N increase and At decrease while maintaining N At = 1, and measure 
fluctuations of the rescaled process about its limiting mean value N in units of 


order 1/./N. . A transition occurs every 1/N time units. The full definition of 
the approximating process is 


Pr{AX = +1|Xy(t) = x} = ; + 


Xy(LNt]) — N 

JN 
A unit of time in the limiting process corresponds roughly to N transitions of 
the original process and a unit change in the limiting process corresponds 
roughly to a fluctuation of order y N in the urn composition. Let AY = 


AY (t, At) = Yy(t + 1/N) — Y(t) determine the displacement over the time 
interval At = 1/N. Then 


Yy(t) = 


Pr far- wis) = vb = Pr{AX =+1|Xy([Nt]) =x = N + y./N} 
_1,N-(W+yJ/N) l 
2 2N 
_! Ye. 
“37 0 TN 
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We now compute for the urn process {Yy(t)} what are roughly equivalent 
to the infinitesimal parameters for the Ornstein—Uhlenbeck process. Accord- 
ingly, we have 


Cr e a ace) -* = —yAt, 


so limao ELAY | Yy(0) = y]/At = — y uniformly for y in bounded intervals. 
We now compute 


1 pia y 1 y Pee ee 
KAIRO- Ela Se) Ne tam) lo ware! 


A similar computation leads to the relation 


E[AY| ¥x(0) = y] = 


E BUAY (0) = y]= 0 as Nap o 
uniformly for the y variable restricted to bounded sets. 

Taking account of the discussion of Section 1, the evidence is compelling 
that the processes {Y)(t),t > 0} will converge in an appropriate sense to the 
Ornstein—Uhlenbeck process, and this is indeed the case.t This suggests that 
the Ornstein—Uhlenbeck process can be used to calculate quantities that will 
be approximately correct for the Ehrenfest urn model when the number of 
particles is large. 

If {V(t),t >0} is an Ornstein-Uhlenbeck process with parameters 
u(x) = —ax and o?(x) = 07, then conditioned on V(t) = x, the distribution of 
V(t + s) is normal with mean 


E[V(t + s)|V(t) = x] = xe” 


and variance 


(57) j 
var[V(t + s)| V(t) = x] = Sigua E 


This will be derived in Section 5. For now let us check the consistency of this 
distribution with the desired infinitesimal parameters. With h = At and 
AV = Vit + h) — V(t), we have 


üm E[AV|V(@) = x] = lim! {E[Ve + HIV =x] - x} 
hilo h h}0 h 


= lim Ž (e~* — 1) 
hilo 


= — Ox, 


+ We give no formal proof here. 
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and 
1 os sed 
lim — E[(AV)?| V(t) = x] = lim + {var[AV|V(t) = x] + E[AV|V(t) = x]?} 
no h nyo h 


__ p97 2ah 
lim , (=) + ean | 
hilo 


=o". 


(a) Elementary Transformations of Processes 


Before constructing other important diffusion processes, we determine the form 
and infinitesimal parameters of new processes built from certain transforma- 
tions applied to given ones. A continuous strictly increasing function g may 
be used to transform an arbitrary stochastic process {X(t)} into a new process 
defined by Y(t) = g(X(t)). If {X(t)} is a continuous path Markov process, i.e., 
a diffusion, then so is {Y(t)} since g was assumed continuous and monotone, 
and hence, {Y(t)} will be a Markov process having continuous paths. If, in 
addition, {X(t)} has infinitesimal parameters u(x) and o7(x) given by (1.2) and 
(1.3) and g has two uniformly continuous derivatives g’ and g”, then Y(t) will 
also have infinitesimal parameters. 

In Theorem 2.1 below we determine the infinitesimal parameters of Y(t) 
= g[X(t)]. A complete proof requires one to operate in terms of truncated 
moments, but this development is not worth the effort, ab initio and is not 
given here. We develop the pertinent formulas without full formal rigor. 


Theorem 2.1. Let {X(t),t > 0} be a regular diffusion whose state space is an 
interval I having endpoints | and r, and suppose {X(t)} has infinitesimal para- 
meters u(x) and (x). Let g be a strictly monotone function on I with continuous 
second derivative g"(x) for l<x<r. Then Y(t)=g[X(t)] defines a regular 
diffusion process on the interval with endpoints g(l) and g(r), and {Y(t)} has 
infinitesimal parameters 


Hr) = 507(x)g"(x) + w(x)g'(x) (*) 
oxy) = PAg? 


where y = g(x). 


Remark. Extensions and elaborations on Theorem 2.1 are covered in the 
later developments of Sections 14 and 16. These transformations of diffusions’are 
subsumed in what is known as the Ito transformation formula. We present a 
direct discussion producing the infinitesimal parameters (*) which incorporates 
the main steps in their validation. 
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Proof. We consider only the case where g is strictly increasing. The strictly 
decreasing case is similar. 

For g twice continuously differentiable, the Taylor expansion furnishes thé 
representation 


gx + Ax) = g(x) + Ax g'(x) + 3(Ax)?9"(x) + (Ax)? Lg") — g'o] 


with x < č < x + Ax. Substituting X(t) = x and AX = X(t + h) — X(t), we 
have 


IXE + h)) = XD) + AXg'(X(O) + HAX)’9"(X(O) 
+ (AX) Lgo) — gX), (2.1) 


where E(w) lies between X(t) and X(t+h) and œ signifies the particular 
realization of the process at hand and serves to emphasize that €(@) is random. 
We can write (2.1) in the form 


Y(t + h) — Y(t) = AX g'(X(0) + HAXYP gX) + LAX) L9"(E(@)) — gX). 

(2.2) 

We condition on X(t) = x so that g(x) = g(X(t)) = Y(t) = y, and take expec- 
tations in (2.2). Then dividing by h leads to 


ttl 
i g BLY + h) — YOYO = y] = ug) + 20g") 


1 
+3 lim he AXP {g Elo) — g"(X(0)}]. 
h}0 


The stipulated continuity of g” ensures the convergence of g"(E(w)) to g"(X(©)), 
and this in conjunction with the convergence of h~ 1E[(AX)*] produces} the 
limit 


lim = EAX (Elo) ~ 9" (X()}] = 0, 
hilo 


whence 
1 
Hy(y) = lim ne LY¥(¢+h)— YO\YO=y] 
hyo 


= p(x)g'(x) + 407(x)g"(x) 


as stated. 
The infinitesimal variance of Y(t) is ascertained following a similar pro- 
cedure. We square (2.2) to obtain 


[Y(t + h) — YO) = (AXPE (XO) + Ra (2.3) 


f A rigorous derivation requires consideration of the infinitesimal truncated moments men- 
tioned in (1.2’) and (1.3’). 
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where the remainder R, contains only (AX)? and higher-order terms. We know 
by (1.4) that lim,,oh7*E[|AX/'|X() = x] =0 for r>3. A straightforward 
argument on (2.3) then establishes 


1 
im p Ere + h) — YOVIYO = y] = PN. 


The condition (1.1) for Y(t) = g[X(t)] is readily verified. Thus Y(t) de- 
termines a diffusion with infinitesimal parameters as displayed in the statement 
of the theorem. W 


D. GEOMETRIC BROWNIAN MOTION 


We apply the transformation device of Theorem 2.1 to define geometric 
Brownian motion. Let {X(t), t > 0} be a Brownian motion process with drift u 
and diffusion o?. The process defined by Y(t)=e*® is sometimes called 
geometric Brownian motion. The state space is the interval (0,00). If 
to <t, < + <t, are time points, the successive ratios 


Y(t1)/¥V(Co)s «+s YNY (tn-1) 


are independent random variables, so that, roughly speaking, for geometric 
Brownian motion, the percentage changes over nonoverlapping time intervals 
are independent. With y = g(x) = e* we have g'(x) = g"(x) = y, and hence the 
infinitesimal parameters for geometric Brownian motion are 
uy) = (u +30°)y > and — ayy) = oy’. 

Geometric Brownian motion is often used to model prices of assets, say, shares 
of stock, that are traded in a perfect market. Prices are nonnegative and 
exhibit long-run exponential growth (or decay), two properties shared by 


geometric Brownian motion. More recently, geometric Brownian motion has 
featured in describing certain population growth processes. 


E. THE BESSEL PROCESS 


In Chapter 7, the Bessel process was defined as the Euclidean distance from the 
origin ofan n-dimensional Brownian motion and shown to be a Markov process. 
We shall use Theorem 2.1 to determine the appropriate infinitesimal parameters. 
First let 


Z(t) = X,(t)? + = + X0? 


where {X (t), t = 0} are independent standard Brownian motion processes. We 
shall condition on X(t) = x;, i = 1, ..., n, and write 


X(t + At) = x; + AX; and Z(t + At) = z + AZ, 


where 
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Then 
AZ =[X,(t + ADP — x? + + [Xt + A9]? — x2 
= (x, AX, +--+ +x, AX,) + KAX, + + + (AX,)7]. 


Since AX,, .... AX,, are independent and normally distributed with zero means 
and variances At, it follows that 


E[AZ| Z(t) = z] = nAt 
and 
E[(AZ) | Z(t) = z] = Axi +--+ + xq) At + o(At) 
` = 4z At + o(At), 


where o(At) represents the expectations of terms of the form (AX,)* and of 
higher orders. We may use the known normal distribution of AX, to conclude 
that these terms in total are of order less than At. Similarly, by a more tedious 
but straightforward computation (cf. Example A) we find 


ERAZ Z(t) = z] = O((At)’) 


and so condition (1.11) holds for the specification p = 4. We conclude 
therefore that Z(t) is a diffusion with the infinitesimal parameters 


Wz)=n and  o(z) = 4z. 
The Bessel process is Y(t) = g[Z(t)] for g(z) = Jz .If y = g(z), then 
w(z)=n,  @°(z) = 4y, 
PO = 12/2) = 1/2y), and g"(2) = -1/(429”) = -1/(4y’). 


We apply Theorem 2.1 to obtain for the infinitesimal parameters of the Bessel 
process 


uy) =n- y) and oy) = 1. 


The transition density for the Bessel process was evaluated in Section 6 of 
Chapter 7 and is reproduced in Example 3 of Section 6 which follows. 

For n = 1, the Bessel process can be identified with reflecting Brownian 
motion, for which uy(y) = 0, and o?(y) = 1 as indicated earlier. 


F. WRIGHT-FISHER (HAPLOID) GENETIC MODELS AND ASSOCIATED DIFFUSION 
APPROXIMATIONS} 


Consider a population of constant size N individuals composed of two types 
A and a. Suppose the current state (the number of A-types) is i and therefore the 
other N — i individuals are of a-type. The next generation is produced subject 


+ Consult also Chapter 2, pp. 55-58. 
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to the influence of mutation, selection, and sampling forces. We stipulate that 
mutation converts at birth an A-type to an a-type and an a-type to an A-type 
with probabilities « and $, respectively. Given the parental population com- 
prised of i A-types and N —i a-types, the expected fraction of A-types 
after mutation is (i/N)(1— «)+ (1 —i/N)B and of the a-types is 
(i/N)a + (1 — i/N)(1 — B). We next stipulate that the relative survival abilities 
of the two types A and a in contributing to the next generation are in the ratio 
of 1 + s to 1 where s is small and positive. Thus, type A is selectively superior 
to type a. Taking account of these mutation and selection forces, the expected 
fraction of mature A-types before reproduction is 


= (1 + s)[i(1 — a) + (N — iB] 
~ (1 +s)[i(1 — a) + (N — iB] + [in + (N — D1 — p] 


The Wright-Fisher model postulates that the composition of the next gener- 
ation is determined through N binomial trials, where the probability of pro- 
ducing an A-type offspring on each trial is p; as given in (2.4). Thus the population 
process {X(t) = number of A-types in the tth generation} evolves as a Markov 
chain governed by the transition probability matrix 


Pi (2.4) 


Pj= (Ja = p“ (2.5) 


Note that the average change p; — i/N of the fraction of A-types at the end of 
a generation cycle expresses the mean changes accountable to selection 
and mutation as in a deterministic infinite population process. The statistical 
fluctuations due to the finite population size are reflected in the probabilistic 
transition matrix (2.5) and its inherent sampling characteristics. Even for this 
explicit Markov chain it is rarely possible to compute in analytic form relevant 
probabilistic functionals. 

However, for N large the process can be approximated by a number of 
diffusion processes depending on the relative orders of magnitude of the 
parameters a, B, and s. We provide some basis for this statement in the 
remainder of this section. For instructional purposes we consider a series of 
cases all of independent interest. 


(a) Wright—Fisher Gene Frequency Diffusion Model Involving Only 
Mutation Effects 


«==, p=—= with y >0, y, >0, s=0. 


(The intensity of mutation of A > a in the population, Na = y,, is positive and 
finite, and similarly for NB = y,.) Consider the associated process 


Yy(t) = AND. (2.6) 
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One unit t = 1 of the Y,(t) process corresponds to the lapse of N generations 
in the X(t) process and, of course, Yy(t) represents the fraction of the A-type in 
the population at generation time [Nt]. The assumptions « = y,/N and 
B = y2/N have a convenient intuitive interpretation: The rate of mutation is 
constant per unit of time of the scaled process, which is N generations in the X 
process. 

We adopt the notation (with h = 1/N) 


X(LNt] + 1) — X([NT)) 
N 


AY,(t, h) = ta(t + x) — Y(t) = 


and compute 
a| avs, hla) = € = x z opi yh: 3 x) z wo} Y(t) = 5l 


Under the conditioning as indicated, N Y(t + 1/N) is distributed binomially 
with parameters (p;, N). Thus the expectation in (2.7) is 


ie, Wine. | 
- -a$ (1 -x)p xl-" yt ( ~ 5) (2.8) 


For h = 1/N, after combining (2.7) and (2.8) and letting i/N > ¿é as N > œ, 
we obtain 


lim 5; ETAY KL Ye) = E = =ë + 1 — Dra 29) 


and the convergence holds uniformly for 0 < é < 1. 
Next we shall compute = 


T EIA, h)}*| Yet) = &] = NEÇA Y(t, h)? l Yala) = E. (2.10) 


Again using elementary moment formulas for the binomial distribution (2.10) 
is calculated to be equal to 


1 l N2 
nliz [Np(1 — p) + N°p?] — 25 Pi + (x) t 


Inserting « = y,/N, B = y,/N and applying a little manipulation simplifies the 


above to 
i i 1 
i - i) y o(;) 


prs 
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where O(1/N) is of order at most 1/N. Thus, we obtain (with h = 1/N) 


ee , 
ky 7 EHA I, h)}?|¥e(t) = €] = (1 — é) 


(the convergence taking place uniformly for 0 < € < 1). 


A more tedious but straightforward evaluation leads to the relation 


lim > E[{A¥(x, h)}*1 Ya(2) = €] = 0 


N> œ 


and again the convergence is uniform with respect to 0 < ë <1. 
Indeed, rewriting (2.12) in terms of X yields 


1 
ye ERXUNT] + 1) — X([Nt))}*| X(ENt)) = NE] 


= ka (EL{X(ENT] + 1)}*|X([Nt]) = NE] 
— 4NEE[{X([Nt] + 1)}°|X((Nt]) = NC] 
+ 6(NEPEL{X(INt] + DY IXEN) = NE] 
— 4(NEPELX(LNt] + DIXEN) = NC] 
+ (NO) | 
(and abbreviating q; = N~'E[X([Nt] + 1)|X([Nt]) = NED 


= {Nti — 4€q3 + 667g? — 467g, + &) 
+ 6q(1 — qa)N°(q? — 2Eqz + €P) + o(N*)} 


= wi [N4(qz — &)* + ON3(qz — E? + (N°). 
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(2.11) 


(2.12) 


Recalling from (2.8) and (2.9) that q — č = O(1/N) we achieve the desired 


result. 


Comparison of (2.9), (2.11), and (2.12) suggests strongly that the processes 
Yy(t) = X([Nt])/N converge as N > œ to a diffusion process Y(t) whose state 
space is the unit interval [0,1] with drift coefficient u(€) = —7,¢ + (1 — &)y2 
und variance coefficient &(1 — &). The above results can indeed be rigorously 


validated. 


In the population genetics literature the process Y(t) has been used to 
compute important functionals of X(t) with quite good accuracy. Several of 


these calculations will be illustrated later. 
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(b) Wright-Fisher Gene Frequency Diffusion Model Involving Selection 


Consider « = p =0 (no mutation) but with the presence of selection 
differences such that s = a/N with o finite. Consider again the process Yy(t) as 
in (2.6). We make the same computations as in (2.7). These give 


io (1 + s)i/N i 
N (1+s)i/N+(1—-i/N) N 


= (1 + o/N)i/N D 
~ (1+ 0/N)(i/N) + (1 — i/N) 


__ A/N)o(i/N)( — i/N) 
1+ (o/N)i/N 


1 1 i 
=pl- + (5) ( = x) (2.13) 


so that for h = 1/N we derive on the basis of (2.13) the limit relation 


p| atts h)| Yy(t) = 5] Sp 


i 
N (inserting s = o/N) 


(expanding the denominator) 


lim NE[AYp(z, h)| Y(t) = €] = o&(1 — 8. (2.14) 
ho 


In a similar manner we obtain 


mn NEHAY C, hY YO) = €] = &(1 o) (2.15) 


arid also (2.12) holds. 
In view of (2.14) and (2.15) we expect the associated limiting diffusion Y(t) 
to have coefficients 


mg) =oG(1—¢) and =o (2) = (1 — 4). (2.16) 


(c) Two-Types Growth Model 


The scaling used to create the Y process from the X process in (a) converted 
the actual number of A-types into the frequency of the A-types in the 
population. The assumption was that the number of A-types was of order N 
and that fluctuations of order N were of primary interest. For a large 
population, the frequency of A-types might be relatively low (o(N)); hence 
scaling by N could conceal the fact that the number of A-types is large, for 
example of order JN . In the previous cases we dealt with a sequence of 
processes Yy(t) (N > oo) tracing the fluctuations of the fraction of the A-type in 
the population. Where the A- and a-types are abundant, fluctuations in Y,(t) 
are visible provided observations are made approximately each N generations. 
In what follows, we consider a scaling by JN . Such a model is useful in 
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determining whether introduction of a new type into a population on a lower 
order of magnitude than N (the population size) will result in extinction or 
spread of the new type. 

Consider now the case where the number of A-types is of the order z./N 
(0 <z< œ) so that for N large the fraction of A-types is virtually zero. To 
study the fine stochastic structure of this situation we consider the sequence of 
processes 


Zy(t) = XVN D Z (0) = z, 0<z< œ. (2.17) 
JN 

This process focuses on the fluctuations of the number of A-types ranging in 
the order of magnitude JN . We have adjusted the time scale so that 
fluctuations in this process are pronounced provided observations are made 
at time epochs of approximately JN generations. Consider the conditions on 
the mutation parameters as in Model (a), a = y,/N and $ = y2/N,s = 0. We set 
h = 1/,/N and introduce 


Ay Z(t, h) = Zy: + Ga) — Zy(t) = XVN q E XVN) 
(2.18) 


The computations of the moments of (2.18) run parallel to the preceding 
cases. With i = [z JN ], substitution for p; yields 


JN Ef AyZ(z, h)|Zy(t) = 2] = /N(/N p: - 2) 
= mi-a) + -a-o 


as N> oo. (2.19) 


A similar evaluation of the second approximating infinitesimal moment 
leads to 


mlel] 
and finally 


4 
J NE [fasz (« =z) Zy(0) = z| —0 as N > œ uniformly for 
k JN compact regions of the 


See z variable (2.21) 


Zy(0) = z| >z as N > œ uniformly in 
any compact region of 
the z variable, (2.20) 
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obtains. From interpretation of (2.19)-(2.21), it is apparent that the appro- 
priate limiting diffusion Z(t) = lim Zy(t) has state space I = (0, 00) with 


u(z) = y, = constant and = o (z) =z. (2.22) 


This diffusion can be used to study the fine fluctuations of the number of A- 
types when they are of the order JN . The variable Z(t) reaching oo means 
practically that the number of A-types has overgrown order JN, and Z(t) 
shrinking to 0 is interpreted as meaning that the A-type has become more rare 
than order JN : 


(d) Different Orders of Mutation Rates 


Here « ~ y,/N4, B ~ y2/N, 0 <d < 1 with 0 < 7;,72 < œ, s = 0. Notice 
that the mutation rate of A > a is of a substantially larger order of magnitude 
than the reverse mutation rate. In this case we consider the sequence of processes 


X([N“t)) 


Wal) = 


(2.23) 
Notice that the right time scale to reflect changes involves observations about 
N? generations apart. The level of the A-type where fluctuations are dis- 
tinguishable is that the number of A-types be of the order N’. Straightforward 
calculations paraphrasing the analysis of the previous cases show that 


NiE pen +) ~ X(N 


Nê 


w | {xara aie a 


X([N“t]) = an| > y2 — YıX, 
(2.24) 


X([N“t]) = xn] >x, 


and the higher-order infinitesimal moments go to zero. This suggests that the 
appropriate limiting process W(t) for Wy(t) is a diffusion with 


Ux) =y2—-y1x and o%(x)=x. (2.25) 


This process is sometimes called a Laguerre diffusion since Laguerre ortho- 
gonal polynomials are involved. This diffusion features also in models of 
population growth and branching processes. Further analysis of this case is 
deferred to Section 13. 


(e) The Growth Process for Rare Types 


Here $ ~ y,/N, « fixed but not small, 0 < « < 1, and s = 0. 

No associated limiting diffusion is possible, but if the A-type is exceptionally 
rare, then the process of the actual number of A-types Xy[t] with time scale 
one unit corresponding to one generation is described by a limiting process 
{U(t), t = 0} composed of a branching process part plus an immigration 
process. 
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To see this, we consider the generating function of Xy(t + 1) conditioned 
on X(t) =i: 
N 


N 
2 PMs = Y Pr{X p(t + 1) = k| Xplt) = ist 
=0 k 


=0 


= [ps + (1 — p“ = [1 + (s — 1)p,]* 


= fı +(s— a — a) + s|: -i)i 


= (s—1)i(1—a)  y(s—-1) ae 
= E elo nei -x)| , 2.26) 


and as N-oo the generating function clearly approaches [since 
(1 + a/N)* > e°] 


Y Pr{U(t,+ 1) = k|U(2) = ist = (e67? U-ayien6— V, (2.27) 
k=0 


Thus the process U(t) behaves as an ordinary branching process whose offspring 
distribution is Poisson with parameter 1 — «. In addition, in each generation a 
number of A type individuals following a Poisson distribution with parameter 
Y2 immigrates into the system. 


(£) The Ornstein-Uhlenbeck Process Again 


Here 0 < a, P < 1 are not small mutation rates and there are no selection 

effects; s = 0. As N —> œ the Wright-Fisher process (2.5) tends to a de- 
terministic limit in the sense that the fraction of the A-type converges so that 
X(t) Æ 

U(t) = — > —— 

n(t) N =F a+ 

It is possible to make a fine analysis of the fluctuation behavior of X(t) about 

its mean NB/(« + p). To do this we introduce the sequence of random 

processes 


with probability 1. 


Xxl) — NP + B) 


Naba + BY 


It can be established that V(t) for N large is approximated by an Ornstein- 
Uhlenbeck process having parameters u(x) = —(« + B)x and o?(x) = 1; cf. 
Example C. We shall not enter into details. 

To sum up, we have highlighted six different processes associated with ‘the 
Wright-Fisher model all relevant for different ranges of the parameter values. 
Uses of the approximating diffusions in calculating functionals of the process 
will be claborated in Section 4. 


W(t) = 
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(g) A Model with Selection Parameters Varying Stochastically 
in Time (Extension of Case b)t 7 


Fluctuations of selection intensities may be caused by random changes in 
environment or genetic background affecting both large and small pop- 
ulations. 

Consider a population of N individuals reproducing in discrete generations 
manifesting two types A and a with fitness coefficients as indicated. 


A a 
fitnesses in generation t: 1+0” 1+ p. 


The selection intensities {o, p; t = 0, 1,2, ...} are assumed to fluctuate over 
time in a random and or systematic manner reflecting a changing ecological or 
genetic environment. Population size is held constant at N individuals per 
generation. 

The fluctuations in the number of A-types (and a-types) over successive 
generations are generated in accordance with the standard Wright—Fisher 
Markov chain process. Specifically, the probability law governing the number 
of A types in generation t, given that the selection parameters are (o, p®) and 
the population consists of i A-types (and accordingly N —i a-types) in 
generation t — 1, is that of a binomial distribution with parameters (N, p®), 
where 


pP = (1 + o)i/[ + 0) + (1 + p®(N — i). (2.28) 


Thus, the stochastic model is structured by superimposing a binomia! sam- 
pling scheme where the average changes are determined by (2.28). Conditioned 
on an outcome of the environmental process, i.e., for each realization of the 
selection values, we construct a time inhomogeneous Markov chain on the state 
space {0, 1,..., N} with transition probability law from generation t — 1 to t 
given by 


Pt; i,t) G Jeya = py’ 5j 
i, J i i > 


where p® is determined in (2.28). 
In essence, there are three sources of variation affecting the changes in the 
population numbers of the A-types over successive generations: 


(i) Sampling variance. Statistical or sampling fluctuation stemming 
from small population size. Formally, its effects are expressed in the binomial 
sampling mechanism. 

(ii) Within generation selection variance, The randomness of the selec- 
tion intensities in a given generation is reflected in (0, p), which is a random 
vector for generation ¢ usually having a known or estimable distribution 


+ This case involves a rather detailed analysis and might be skipped on first reading. 
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function. Randomness here is tied to the ecological and genetic background of 
the immediate time epoch. 

(iii) Between generation selection variance. The correlations and ge- 
neral dependence relationships of the selection process over time, that is, 
{o, p;t > 0}, may constitute a complex stochastic process embodying a 
hierarchy of dependence associations. The simplest assumption would have 
S, = {o, p}, t = 0, 1, 2, ..., mutually independent, identically distributed ran- 
dom vectors. 


Our objective is to evaluate more completely the relative effects resulting 
from the two different random factors: random sampling of gametes (induced 
by the small constant population size) and random fluctuation of selection 
intensities ascribable mainly to temporal environmental changes. When popu- 
lation size is finite and constant, then ultimate fixation of the A- or a-type 
necessarily occurs. We focus on two main problems. First the determination of 
the probability of fixation of the A-type (the so-called absorption probabilities) 
in terms of the known information on the selection process and the initial 
numbers of A-types; second, the calculation of the expected time to fixation. 
The main emphasis is to contrast qualitatively and quantitatively the findings 
on these problems for the constant selection environment with the case of a 
fluctuating pattern of selection intensities. We will again use diffusion 
approximations. 

We make the following assumptions for o and p”: 


Elo] =È + o(=). 
E[p) = vt o( x). 
E[(o®)] = — aofi j] (2.29) 


Ep] = — 2405 ), 


1 —_—— 
Efo = ee 
[op] = w 


and that all terms higher than second degree are of smaller order than 1/N. 

There are now two stochastic processes involved here describing the 
changes in A type frequencies: the X(t) process, in which randomness is due to 
both the sampling procedure and the variable selection forces; and, the process 
X(t) = Np,, which describes the number of A-types as a result of the selection 
process alone (with no sampling). We shall first compute the drift and variance of 
the diffusion associated with the X(1) process. 
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Let Yy(t) = X([Nt])/N = x. Then 


f 1/N)— Ê x 
M(t + aN n(T) fa) = x] 


lim ELX([Nt] + 1) — X((Nt)|X((Nt]) = Nx] 
N> œ 


N> œw 


A(x) = lim E | 


lim E[N(p, r=, x)], 
N>% 
where 


x(1 + o) x(1 — x)(o — p) 


<3 eV d= odes). LA pa Ge — py. 


We expand p, — x in a power series and discard terms higher than second 
order, in line with our assumption on o and p. This gives 


Px — 


fix) = lim Nx(t — af? ao FX, 4-24 ox) | 
= x(1 — x)[y —6 + v, —r—x(v, + v, — 2r)] 
= x(1 — x)[T + VG — x)] (2.30) 


where I = y — ô + 4v, — v,) and V = v; + v, — 2r. 
For the variance term, 


Y 1/N) — Yy(t)}?| = 
{Yq(t + ay Yy(t)} Pc) = x] 


= lim NE[(p, — x)?| X([Nt]) = Nx] 
N> œ 


62(x) = lim e| 


N> œ 


= x?(1 — x)*V. 
Thus, we have determined that 
Âx) = x1 — xT + VG- x], 8x) = x*(1 — x) y. 


There is a simple relationship between the infinitesimal parameters of the 
processes induced by X(t) and X(t). For the drift and variance, the random 
effects due to the sampling effects and the selection process are additive. In fact, 
the drift and variance due to sampling alone are 0 and x(1 — x), respectively. 
We find for the X(t) process, 


Ux) = A(x) + 0 = x(1 — x)[T + VG — x)], 
a(x) = 67(x) + x(1 — x) = x(1 — x)[1 + Vx(1 — x)]. 


In fact, a more general statement concerning the additivity of stochastic 
effects is true. Suppose we incorporate mutation into the last mentioned 
model. That is, each generation consists of mutation, followed by selection, 


(2.31) 
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and finally binomial sampling. The process Z(t), which equals the number of A 
types, has a transition matrix with entries 


N\ ; -j 
Pj T (Jea = py“ 


where 


X [i(1 — a) + AN — DI + 0) 
[iA — a) + BIN — DIA + o) + EN — i — a) — BIN — DI + p) 


The parameters « and f and the random variables o = o and p = p are 
defined as before. We construct the scaled process W(t) = Z([Nt])/N. The 
drift and variance of the diffusion induced by W,(t) (as N > œ) are the sum of 
the drifts and variances, respectively, for each of the forces (mutation, fluctu- 
ating selection, sampling) acting on the population. Moreover, this result is 
independent of the order in which selection and mutation are performed. (The 
student should verify all this.) 

The parameters I’ and V carry the following motivation. Suppose the 
frequency of A, equal to x, was low. Then the fraction of A in the next 
generation due to selection would be 


Pi 


M x(1 + 0) Z (i) 
* = x1+oa+(1-xnd+p) -\itp 


(o = o and p = p® depending on the generation time). 
After n generations, we would have 


"n 1+ a) 
Xx Xd T] =>). 
(T 1+ p” 


To determine the asymptotic behavior of this quantity we compute the 
expectation of log{[ ]#_, [C + o)/(1 + p™)]}, and take the limit as N > œ to 
adjust the time scale with the order terms of (2.29). So, 


N (1+ <a) | x | ( + a) | 
E| lo ——pll = E| log} ——j } |- 
| s Il (i +p” 2 E p”, 
Since the vectors (o™, p™) are identically distributed, 
N 140o% 1+0 
oe.) SNE o — 
PrE GD] 
= NE[log(1 + o) — log(1 + p)] 


= NE[o — 40? — p + 4p? + higher-order terms] , 
(by expanding the log) 


y Dies Coe ae 
~~ aH a+r + ofa) by (2.29). 
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N /1 4 gW f 
lim z| tog} T] (3S =) | = y — ô + 4v, — vı) =T. (2.32) 
N> œ = 1+ p 


Clearly, if > 0, then the A-type increases when rare, and in the circumstance 
T <0, it goes extinct. 
If we compute the variance of log | [¥-; [0 + o™)/(1 + p™)], we have 


l+o® = 1+o% 
varl log T Il (Fs T+ pe) |= a p3 oe | 
1 (1) 
N var (ioe eo) 
Since {E[log((1 + o/(1 + p))]}2 = o(1/N), we find 
1+ ao) 1+ co) 2 
vat log T I] (a z a) = nal foe, + o(1) 


= NE[(o — př] — of) 


=v, +v, — 2r + ofl) 


and therefore 


1+ 0% 
Jim a vat log | [I (Se 7 +5) | =v,+0,—-2r=V. (2.33) 


Thus, the quantities IT and V play an important role in describing the 
behavior of the system when the A-type is rare. 


G. A POPULATION MODEL OF MUTANT TYPES 


The population maintains constant size of N individuals. The times between 
successive changes in the population are exponentially distributed with para- 
meter A. When a change occurs, an individual is chosen at random to die, and 
independently an individual is chosen at random to bear an offspring. Each 
newborn may mutate with probability u and create a new type. An individual 
is called a p-mutant if in its chain of ancestors (including itself) p mutations 
have occurred. 

Let X(t) be the number of 0-mutants at time t and suppose the population 
composition at time t has X(t) = k. Consider the time duration (t, t + h) with h 
small. The number X(t) will increase by 1 if an event occurs entailing the death 
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of a 1 or higher mutant type and the birth of a new 0-mutant which does not 
mutate. At the end of the time interval (t, t + h) we have 


X(t +h)=k+1 with probability (A *) Ea — u) + o(h). 
(2.34) 
In a similar manner we find that 
; a: kik N-k 
X(t+h)=k-1 with probability niku + <=] + o(h) 
X(t+h)=k with probability (2.35) 


N-k(N—-k_ k kk 

t= an + af N ( N + Eu) +E XO w | +o 

Of course X(t + h) =1 (1 # k,k —1,k +1) occurs with probability o(h). The 
expected time between successive events is 1/A. Thus the average duration of 
one generation (i.e., replacement of N individuals or equivalently the occur- 
rence of N events) is N/A. Consider Y(t) = X(t)/N, the proportion of 0- 
mutants at time t. We shall speed up the rate of events (A — œ) and also let 
N > œ in such a manner that Y(t) will behave approximately as a diffusion 
process where one unit of time in the limit process corresponds to about N 
generations of the original process. For this objective we let 4 = N°. > 

Let 


X(t +h) — 
A, Yy(t) = Yy(t + h) — Yy(t) = ee, 


Next we compute, using (2.34) and (2.35) with Nx = k and h = 1/N, the mean 
expected change 


l A|N-kk k fk N-k o(h) 
h E[A, Yy(t)| Ynt) = x] = wl yT Wh — K (ku + “ph i 
(2.36) 


Let u = 0/N, with 0 fixed, and recall that A = N?. The right side of (2.36) then 
reduces to —uN[k/N] + o(1), which tends to — 0x. A similar calculation leads 
to the relation 


im E| tA, Yy(t)}2| Yy(t) = Fa x| = 2x(1 — x) (2.37) 
ayo h N 
and we also obtain 
lim ; E[{A, Yy(t)}4] Y(t) = x] = 0. (2.38) 
hyo 


The computations in this model are quite simple since the difference between 
X(t + h) and X(t) effectively can be only.0 or +1 for A sufficiently small. 
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A First Passage Process for 2-Mutant Types 


We next examine the same process as above which is stopped, however, when 
the first 2-mutant is formed. 

Let Z(t) be the number of 1-mutants existing at time t and suppose that a 
2-mutant type has as yet not appeared. The transition probabilities over the 
time epoch (t,t + h) are readily evaluated by merely recalling the mechanisms 
for changes. We claim that 


Py pailt, t + h) = Pr{Z(t + h) =k + 1[Z(t) = k} 


N-—k|N-k k 
= Ah N È u+ E0 = |+ oh, (2.39) 


Indeed, for Z(t) = k to pass to Z(t + h) = k + 1 requires a death of one of the 
0-mutants (the present state involves N — k such individuals) and then either 
replication of a new 1-mutant or a 0-mutant that mutates without the process 
ceasing, that is, without the formation of a 2-mutant. These contingencies lead 
to the formula (2.39). By similar considerations we obtain 


Paiti +h) = aH Ea] + oh, 
2 
Poat, t +h) =1—Ah+ dh EA (— u) 
k {N-k k 
+ En + vil = w + o(h), 


P,{t,t+h)=o(h, jek, jAR-1, j#k+l1, (2.40) 
and finally 


Pr{a 2-mutant is created and thus the process stops in (t, t + h)| X(t) = k} 
= me + o(h) 
mu H : 


Let u = 0/N. If Z(t) is of the order N and events occur rapidly, then double 
mutants are created almost instantly. On the other hand, if Z(t) is small and 4 
is small, the first creation time of a double mutant may involve a very long 
time. The correct scaling of events to get balance requires A = N*? and 
only if Z(t) is of the order JN is there a bona fide nontrivial probability 
distribution for the creation time of the first 2-mutant. With these normal- 


izations 4 = N®?, u = 0/N, Z(t) = x./N, let 
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Then 


7 BLA, Z(01Zn(0 = x, ie. Z(t) = x /N] 
1 1 x x \@ x 0 
= =- hN??? —— <| 1 — —— — = 
h al all! a i Fal x)| 
x x 0 
— TN 1- UN ( — x) + o(1). (2.41) 


The limit as N > œ is 0. An analogous calculation gives 


lim + E[{A,Zy(0)}2|Zy(t) = x] = 2x. 
hlo h 


The infinitesimal killing rate at Zy(t) = x = k//N is 


1, k 
lim — Ah— u = Ox. 
one NS 
Thus it is plausible that the approximating process can be identified with a 
diffusion involving killing (see Section 1) whose infinitesimal parameters are 


Wx)=0, o%(x)=2x, — k(x) = Ox. 


We shall return to this class of examples in Section 10. 
3: Differential Equations Associated with Certain Functionals 


We emphasized in the introductory section the ease of ascertaining distri- 
butional properties of a broad spectrum of natural functionals defined on 
diffusion processes (especially in the one-dimensional case). The next two 
sections will amply illustrate these claims. Formal justifications of the methods 
rely on semigroup operator arguments (Sections 11 and 12). We at first discuss 
three basic problems and then provide, in Section 4, examples in which they 
arise. A number of important extensions of the ideas and methods are 
set forth at the close of this section. 

We assume in this section, unless stated otherwise, that {X(t),t > 0} is a 
time homogeneous diffusion process satisfying the following conditions: 


l. The state space is an interval I of the form [I,r], (l, r), [l, r), or (l, r), 
where -œ <l<r <o. 
2. The process is regular in the interior of J; i.e., 


Pr{T(y) < 0|X(0)=x}>0,  I<x, y<r, 


where T(y) is the first time, if any, the process reaches the value y (the hitting 
time of y; cf p. 162). 
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3. The process has infinitesimal parameters u(x) and 07(x), for l< x <r, 
where AX = X(h) — X(0) and 


u(x) = lim E[AX|X() = x] 
nyo h 


and 


1 
o?(x) = lim — E[(AX)?| X(0) = x]. 
ho h 
4. The infinitesimal parameters u(x) and o?(x) are continuous functions of 


x and o*(x) > Oforl <x <r. 


Let a and b be fixed, subject to l < a < b < r, and let T(y) = T, be the 
hitting time of y. Throughout this section we let 


T* = T,, = min{T(a), T(b)} = T(a) a T(b) 


be the first time the process reaches either a or b. 
This section concentrates on three problems. 


Problem A. Find 
u(x) = Pr{T(b) < T(a)| X(0) = x}, a<x<b, 


the probability that the process reaches b before a. 


Problem B. Find 
u(x) = E[T*| X(0) = x], a<x<b, 


the mean time to reach either a or b. 


Problem C. For a bounded and continuous function g, find 
T* 
w(x) = ef g(X(s)) ds| X (0) = x|, a<x<b. 
0 


Since the sample paths of the diffusion process are continuous, the integral 
A = JẸ" g(X(s)) ds is defined. If g(x) represents a cost rate incurred whenever the 
process is in state x, then A would be the total cost up to the time when either 
a or b was first reached. If g(x) = 1 for all x, then A = T*, the time to reach a 
or b, so that Problem B is a special case of Problem C. 

Under the stated conditions (1)-(4), it can be shown that v(x) and w(x) are 
finite, that u(x), v(x), and w(x) possess two bounded derivatives for a < x < b, 
and that these functions satisfy the following differential equations: 
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Equation A 


2 
0 = Wo) + 500) 4 for a<x<b, ua)=0, u(b)=1; 


(3.1) 
Equation B 


d 2 
se Hx) + 5070) for a<x<b,  v(a)= o(b)=0; (3.2) 


Equation C 


dw 1, dw _ = 
—g(x) = u(x) as + 5 (x) De for a<x<b, w(a)=w(b)=0. (3.3) 


The explicit solutions to these differential equations appear in (3.10) to (3.12). 

A straightforward heuristic justification of the above differential equations 
is now described. Consider Problem A. The boundary conditions u(a) = 0, 
u(b) =1 are obvious, since u(x) is the probability of reaching b before a, 
starting from x. Now consider a < x <b, and choose a time duration h 
sufficiently small that the probability of reaching a or b before time h is 
negligible. At time h, conditioning on the position of X(h), the probability of 
reaching b before a is u(X(h)). Invoking the law of total probabilities gives 


u(x) = E[u(X(h))| X(0) = x] + olh) 


where the error term o(h) is of smaller order than h. We now write 
AX = X(h) — x and assume we can expand in a Taylor series 


u(X(h)) = u(x + AX) 
= u(x) + AX u(x) + HAX)Pu"(x) + =, 


for which the fourth and further terms are of order smaller than (AX) and 
may be neglected. Since E[AX|X(0) = x] = u(x)h + o(h) and 


E((AX)?|X(0) = x] = o?(x)h + o(h), 
we have 
u(x) = E[u(X(h))|X(0) = x] + o(h) 
= E[ulx + AX)|X(0) = x] + o(h) 
= u(x) + E[TAX|X(O) = x]u'(x) + $E[(AX)?|X(0) = x]u"(x) + o(h) 
= u(x) + px)hul(x) + 4o?(x)hu"(x). + olh). 
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We may subtract u(x) from both sides, divide by h, and let h decrease to zero to 
conclude ` 
0 = u(xju'(x) + ż0°(x)u"(x) a < x <b, 


which is the desired differential equation for Problem A. 

We shall motivate the derivation of (3.3) in the same vein. Again, the 
boundary conditions should be clear. Choose a short time duration h as before. 
At time h, conditioning on X(h), the ‘expectation of the total integral is the 
expectation of the contribution up to time h, i g(X(s))ds, plus the expectation 
of the contribution over the remaining time. Conditioned on X(h) = z, the 
conditional mean of the second part is 


T* T* 
z| | g(X(t)) dt| X(h) = | = ef g(X(t)) dt| X(0) = z| 
h 0 
(by the Markov property and stationarity) 
= w(z) (by the definition of w). 


Thus, for a < x < b, we have 


w(x) = a| f g(X(s)) ds|X(0) = x| + E[w(X(h))| X(0) = x]. (3.4) 


Since the sample paths and g are continuous we have the approximation 


al | ax (s)) ds| X(0) = x] = g(x)h + olh), 
0 


and, as before 
E({w(X(h))| X(0) = x] = E[w(x + AX)|X(0) = x] 
= w(x) + u(x)w(x)h + 407(x)w"(x)h + o(h), 
so that (3.4) becomes 
w(x) = g(x)h + w(x) + u(x)w'(x)h + 407(x)w"(x)h + olh). 


Then, subtracting w(x) from both sides, dividing by h and sending h to zero 
will produce the differential equation for Problem C. 

We turn to the solutions of the three problems. After recalling the 
assumption that o7(x) > 0 for l < x < r, let 


s(x) = epf- [tzeen ač} for l<x<r. (3.5) 


We use indefinite integrals here and in what follows for reasons that will later 
become clear. Next we introduce the fundamentally important scale function of 
the process 


S(x) = f% dn = Fw- | cameviore as} dn for [<x<r_ (3.6) 
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and speed density 
m(x) = 1/[o7(x)s(x)] for 1<x <r. (3.7) 
Equations A-C each involve the differential operator L defined by 
Lf (x) = wx) f(x) + 30°) f"(), 


for f(x) a twice continuously differentiable function on (a,b). A modern 
approach to their solution is to express this operator as successive differen- 
tiations with respect to the scale and speed measure. To this end, first verify 
that s'(x)/s(x) = —2u(x)/o?(x). Then, following a classical approach, introduce 
1/s(x) as an integrating factor and thereby separate the variables, achieving 


1 1 afi dfx) 
HQ) =5 (eas) dx l zo], G8) 


To obtain a more succinct and meaningful expression for L, write 
s(x) = dS(x)/dx in the differential form dS = s(x)dx and similarly write the 
speed density as a differential of a speed measure M in the form dM = m(x)dx. 
In terms of these differentials, the operator L in (3.8) is simply 


ld jd 


The above expression is called the canonical representation of the differential 
infinitesimal operator associated with the diffusion process. In terms of this 
canonical representation, the differential equation for Problem A becomes 


1d Ee 


74M jo for a<x<b, ua)=0, u(b)=1. 
The solution follows directly from two successive integrations. 

Denoting the constants of integration by « and f, we integrate once to 
obtain du(x)/dS(x) = B, and then again to get u(x) = « + BS(x),a< x <b. The 
boundary condition u(a) = 0 determines « = — BS(a), and then u(b) = 1 yields 
B = 1/[S(b) — S(a)]. We summarize the foregoing analysis in an explicit form. 


Solution A 


S(x) — S(a) 
S(b) — S(a) 


Note that u(x) is unchanged if S(x) is replaced by S*(x) = a + BS(x) for any 
constants « and B 4 0. Thus if S(x) is a scale function, so is S*(x). It follows 
that we may specify the scale function S(x) using indefinite integrals since 
the resulting u(x) does not depend on the lower limits of integration [Section 
(3.6)]. 


u(x) = for a<x<b. (3.10) 
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Remark 3.1. The scale function can be used to rescale the state space (I, r) in 
terms of the probabilities of achieving various levels, and this use motivates the 
name. Fix a point xọ as the origin and determine the scale function by 
performing a translation, if necessary, causing S(x9) = 0. Then form the 
process Y(t) = S(X(t)) on the interval (S(J), S(r)). Since S is strictly monotone 
and twice continuously differentiable, an appeal to Theorem 2.1 establishes 
that the infinitesimal parameters of the {Y(t)} process are 


Hy(y) = 507(x)S"(x) + u(x)S'(x) = 0, 
and 


o7(y) = 07(x)[S'(x)]? = 07(x)s?(x), where y = S(x). 


The scale measure for the {Y(t)} process is accordingly Sy(y) = y, or what is 
equivalent, Sy(y) = a+ By, with «, 8 #0 constants. A process {Y(t)} whose 
scale function is linear is said to be in natural or canonical scale since the 
hitting probabilities 


Pr{T(Y) < T(Y)| YO) = y} = (b — y\b-—a) for a<y<b, 


are manifestly proportional to actual distances. 


We proceed to Problem C since Problem B is a special case. In the 
canonical representation, the differential equation is written 


1 d | dw(x) 
== |= |=- f b 
l ae | g(x) or a<x <b, 
subject to the boundary conditions 
w(a) = w(b) = 0. 
Upon the first integration, we obtain 
dw : 
DO = -2["ae amie) +p 


= =2|"eomerds + 6 


and after the second, 


a a 


w(x) = -2f | famo a| dS(n) + BES) — S(a)] + a. 


Then w(a) = 0 implies « = 0, and w(b) = 0 requires 


2 by fn 
P= Sb) — Sta) | | g(S)m(é) a| dS(n). 
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Using u(x) = [S(x) — S(a)]/[S(b) — S(a)], we obtain 


b n x n 
w(x) = 2440) | f g(S)m(¢) a| dS(n) -| f g(S)m(6) a| ason}. 


Changing the order of integration followed by elementary algebraic manipu- 
lations reduces this to a symmetric form. 


Solution C 


b 
w(x) = zlu f [S(b) — SCE)JmEg(E) dé 


+ [1 — u(x)] Friso — S(a)]m(¢)g(6) a}. (3.11) 


As mentioned earlier, the special case g(€) = 1 yields the solution to Problem 
B. 


Solution B 


b 
u(x) = zlu f [S(b) — SE) Jm) dé 


+ [1 — u(x)] [to — S(a)]m(¢) ach. (3.12) 


Remark 3.2. Consider for a moment a process in natural scale wherein 
S(x) =x and s(x) = S'(x)= 1. This can be achieved by a suitable transfor- 
mation of the state space; see Remark 3.1. Calculate the mean time to exit the 
interval (x — £, x + e) starting at x. Inserting a = x — €, b = x + e, u(x) = 4, 
and S(x) = x in (3.12), we obtain 


x+ x 


ELT; -+ x+:1X(0) = x] -Í (č — x + 8)m(¢) dé, 


x 


(x +e — Smo ae +| 


x 


whence / 


lim 5 EIT- x+ XO) =x] = im 2 | ts + £ — č)m(č) dé 
el0 x 


£0 


+ lim 5 f (€ — x + e)m(&) dé 


l0 
= m(x). 


Thus the speed density m(x) can be construed as the speed at which the 
clock of the process runs when located at the state point x. Equivalently, if we 
regard the clock of a standard Brownian motion as standard, and provided the 


198 15. DIFFUSION PROCESSES 


process is in natural scale, the quantity m(x)e? is of the order of the expected 
time the process spends in the interval’ (x — ¢,x + €) given X(0) = x before 
departure thereof. 


Remark 3.3. It is of interest to calculate the expected amount of time prior to 
T* that the process spends in an interval [¢, č + A). Shrinking A to dé gives 
the expected local time at €. To make these ideas more precise, write the 
solution to Problem C, 


w(x) = e| (axo ds|X(0) = x]; a<xx<b, (3.13) 
in the form 
w = [a DO d, 8-14 
where l 
TER sa ro SE SES? 


a ES) — SJES) — SJ 1 
S(b) — S(a) oE 


a<€<x<b. 
(3.15) 


The function G(x, č) is called the Green function of the process on the 
interval [a, b]. Determining the mean time prior to T* that the process 
spends in the interval [é, € + A) is equivalent to evaluating 


w(x) = e g(X(s)) ds| X(0) = x| 


for 
1, E<x<č+A, 
= 3.16 
a) 19 otherwise. or) 
Following the format of (3.14), this is 
E+A 
w(x) -Í G(x, n) dn. (3.17) 
é 


While the function g(x) as defined in (3.16) does not satisfy the continuity 
assumption of Problem C, nevertheless, (3.17) can be established by introduc- 
ing suitable approximating continuous functions. 

Shrinking A to dé, we see from (3.17) that G(x, č)dč measures the mean 
time prior to T* that the process spends in the infinitesimal interval [¢, ë + dé) 
given X(0) = x. 
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DIGRESSION: THE GREEN FUNCTION OF A SECOND-ORDER DIFFERENTIAL OPERATOR? 


In terms of the differential operator L = u(x)d/dx + 407(x)d?/dx?, the solution 
to the second-order differential equation 


Lw(x) = —g(x), a<x <b, with w(a) = w(b) = 0 


is given by 


b 
w(x) = | G(x, S)g(S) dé, 


so that — G, considered as an integral operator, is the inverse to the differential 
operator L. 

We now review some material relating to the more general second-order 
differential equation 


Ly = p(x)y" + q(x)y’ + r(x)y = —f(x) for a<x<b, (3.18) 


with y(x) obeying associated boundary conditions and p(x) > 0 on [a, b]. We 
assume the coefficients p, g, and r are as smooth as required. 
To be specific, we concentrate on the boundary conditions 


ya) = yb) = 0. (3.19) 


(Other choices of boundary conditions can be handled by similar means). If 
the homogeneous equation Ly = 0 admits no nonzero solution fulfilling the 
boundary conditions, then (3.18) can be inverted in the form of an integral 
operator 


b 
xx) = | G(x, 6) f(E) dë, (3.20) 


where G(x, č) is commonly referred to as the Green function of the boundary 
value problem. The following properties characterize G(x, ¢): 


(i) For each č in (a,b) the function y(x) = G(x, č) solves the equation 
Ly = 0 for x in each interval (a, č) and (6, b), so that LG(x, €) = 0. 
(ii) For each & in (a,b) the function y(x) = G(x, č) satisfies the boundary 
conditions (3.19). 
(iii) The derivative of G exhibits a suitable jump discontinuity 


a S 
pE 


The Green function G(x, č) is obtained as follows. Let y,(x) [y2(x)] be a 
solution of Ly =0 satisfying the initial conditions y,(a)=0 and y,(a)>0 
| va(b) = 0, y3(b) < 0]. More generally, y, is determined up to a constant factor 


0G 0G 
3y Gt é) — 3x bo é) 


+ This digression is nota prerequisite to what follows, 
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as a solution of Lu = 0 satisfying the left boundary condition, while anal- 
ogously the solution y, obeys the right-hand boundary condition. These exist 
and are easily determined in most examples. The functions y,(x) and y,(x) are 
linearly independent since we assumed that there was no nonzero solution of 
Ly = 0 obeying both boundary conditions. Let 


WÈ) = Wii, yo) = yEyE) — yEyE) (3.21) 


be the Wronskian of the functions. The following argument shows that W(€) is 
nonzero for a < é < b. Differentiate W and use the fact that y; satisfies Ly; = 0, 
to produce 


d = = 
gz MOG) ya) = Yah — avi = POwe — vay = TW. (B22) 


Solving this differential equation, we infer immediately that 
é 
—4an) 
W(é) = W(čo) ex f in|: 
(Co) exp a PUN) n 


so if W vanishes at some point, then W is identically zero. Consequently 


yy W 
“iy =0 
(>) (y2)? 


wherever y, does not vanish, which means that y, is a multiple of Ya, and they 
are not linearly independent contrary to a previous assumption. Thus 
WE) #0fora<é& <b. 


Now form 
Yi(x) y2(E) 
at for a<x<é <b, 
G(x, &) = oe (3.23) 
A727 
WO for a<čëč<x<b, 


which is well defined since W(é)p(é) # 0 for all a < č < b. For f(x) continuous, 
we claim that the function 


dé 
(3.24) 


satisfies the boundary conditions (3.19) and solves Lw = —f. That w satisfies 
(3.19) is verified using the boundary values of y, and y,. Next, we evaluate Lw. 
Notice that 


w(x) = [os ENSE dE = —y,(x) | ? YEE) 


O POO 
WOE rls) | 


a W(S)p(S) 


dé. (3.25) 


PPN 
wie) = -vio | ROD 


L ypa | AOG) 
Wee” z 


a WPS) 
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Next we calculate 


=~ 09 [OHO ag — o [ MOND a 


W(e)p(S a W(é)p(5) 
4 AOW _ VOI CDSE) 
W(x)p(x) W(x) p(x) 


yep [ROO ag vo [AO a IQ) ie 


x W(é)p(é) WEE O p(x)’ 
Combining Eqs. (3.24)-(3.26) in the obvious manner, we obtain 
Lw(x) = pw" + qw’ + rw = —f(x). 
Returning to the diffusion process operator for which p(x) = 407(x), 
q(x) = u(x) and r(x) = 0, identify 
Ly = 30°(x)y" + a)y’. (3.27) 


Take an interval (a,b) properly contained in (l, r) and construct the Green 
function for the differential operator (3.27) with boundary conditions (3.19) 
according to the recipe of (3.23). Explicitly take 


[S(x) — S(a)] 
[S(b) — S(a)] 
where S(x) is the scale function defined in (3.6). Clearly y,(x) vanishes at a and 


y;(a) > 0. Take y(x) = [S(b) — S(x)]/[S(b) — S(a)] exhibiting the properties 
y2(b) = 0, y5(b) < 0. The Wronskian of y, and y, reduces to 


we) = SCE) — S@\f_ -s@)_\ _ (S) -= S® s() 
~ \S(b) — S(a)] \S(b) — Sa) \S(b) — S(a@)) \S() — S@ 


yx) = 


___s6) 
S(b) — S(a) 
with / 
dS(¢) * 2u(n) 
ae = s(€) = ex ‘| - | aay in|. (3.28) 
Thus for the case at hand 
2[S(x) — S(a)] B 1 
| so-s AOS PORE) Giga 
G(x, č) aa 
EO O [s(é) — a<së<x<b 


S(b) — I 2G, a ae 
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Frequently, in the literature the speed measure density 1/07(£)s(&) is separated 
out and the Green function is identified as 


718%) ~ Sls) SQ], asxsé<b, 


ee ee (3.30) 
sasa CO — Sla], a<&<x<b. 


SOME FURTHER FUNCTIONALS 


Extending the analysis of Problem C, we consider a hierarchy of functionals of 
the form 
T 
U(x) = E E (| g(X(t)) ir) 
0 


= BJ (fax 0) ir), (3.31) 
0 


where T = T(a) a T(b) is the hitting time to {a,b}, f is twice continuously 
differentiable, and g is piecewise smooth. Note that we introduce the subscript 
notation on E to indicate the initial point X(0) = x of the process realization. 

For f(x) = x, we recover the functional of Problem C. The evaluation of 
(3.31) for the choice f(x) = x” produces the nth moment of the random variable 
Z = Jf g(X(t))dt. The calculation with f,(x) = e^", where feasible, yields the 
moment generating function of Z. 

By the definition of T we plainly have the boundary conditions 


xo) =x|. a<x<b, 


U(a) = U(b) = f(0). (3.32) 


We derive a differential equation for U(x) by paraphrasing the method of 
solution of Problem C. To this end, assume g(x) is bounded. For h sufficiently 
small, the Taylor expansion leads to 


h T 
U(x) = ap (fax (1) dt -Í g(X(z)) in)| 
T T h 
= ap (f IXE) ir) hd (| g(X(z)) in) f gX) ar + O(h’). 
h h 0 


(3.33) 


If g(&) is continuous at x, then (3.33) has the form 


T T i 
e| (| @(X(z)) ir) + hao (| a(X(z)) ix)| + o(h). 
h h 
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Invoking the law of total probabilities and the Markov property, we can 
reduce the first two terms above to 


E. Exo) J (í g(X(z)) in)] =F habEx J (| g(X(z)) is)]| 


= E,[U(X(h))] + hg) ELV(X(h)] (3.34) 


where we define 
V(x) = ral f (x (t) s)|; (3.35) 
0 


Expanding about x since X(h) is close to X(0) =x for h small (because the 
sample paths are continuous by the basic characterization of diffusion pro- 
cesses), we obtain 


E,[U(X(h))] = U(x) + U'(X)E,LX(h) — x] + 3U"(X)E,[(X(h) — x)?] + olh) 
and i 
E,[V(X(h))] = V(x) + O(h). (3.36) 


Substituting from the infinitesimal relations, subtracting out U(x), and then 
dividing by h and sending h to zero produces the differential equation 


20°(x)U"(x) + UAU) + IV) = 0. (3:37) 


The solution of (3.37) in conjunction with (3.32) depends on knowing V(x). 
For the case f(x) = x", then f'(x) = nx"~!. In accordance with (3.37), the 
nth moment of Z with X(0) = x; that is, 


U(x) = B( | IXO ir) | 


30 °(x)U q(x) + WX) U; (x) + nU,- 0g) = 0 (3.38) 


solves 


subject to the boundary conditions U,(a) = U,(b) = 0. 

Note that U,(x) is exactly the expectation of the functional of Problem C, 
whose determination is given in (3.12). `~ 

In the case f,(x) = e**, then (3.38) becomes 


30°(x)U"(x) + w(x) U'(x) + AU(x)g(x) = 0. 


We next indicate briefly the calculation, of some practical interest, of the 
function l 


T t 
R(x) = ap apd- | k(X(1)) abaxo a (3.39) 
0 . JO. 
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for a < x < b, where T as before stands for the first passage time to a or b. 
We assume g and k are continuous at x. The function k(x) is referred to as the 
killing rate. In fact, where k is positive the multiplier exp{— fg k(X(t)) dz}, 
depending on the previous history of the process, serves as a discount factor 
on the yield g(X(t)) aggregated up to the random time T. 

The decomposition induced by the process after the lapse of a slight time 
duration gives 


R(x) = e| [exp [wx (t)) arhax (t)) dt 
0 0 
f (<xo{- [rx (t)) ar} — 1) exp) | x (t)) abg (t)) dt 
h 0 h 
-f apf- [wx (t)) alax (t)) a 4 
h h 


The usual Taylor expansion combined with the law of total probabilities, 
Markov property, and obvious manipulations, lead to 


30°>(x)R"(x) + UR) — KR) + g(x) = 0 (3.40) 


coupled to the boundary conditions R(a) = R(b) = 
For later purposes, we need to evaluate 


P(x) = E exp — [MXC ach]. (3.41) 
0 


Applying (3.37) with the specification f(x) = e~* and g(x) = k(x) and noting 
that f'(x) = — f(x) in the case at hand shows that P(x) solves 


12(x)P"(x) + UAP) — k(x)P(x) = 0 (3.42) 


and obeys the boundary conditions P(a) = P(b) = 1. 
An extension of (3.41) concerns the evaluation of the functional 


Q(x) = Bef- k (t)) ash T (X(t)) in| (3.43) 
0 0 


with the usual smoothness assumptions stipulating that f and k are continuous 
at x. The relevant differential equation becomes 


20° (x)Q"(x) + UA) — kA) + (x) P(x) = 0 (3.44) 


with the function P(x) one being that of (3.41) with the necessary boundary 
conditions Q(a) = Q(h) = 
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4: Some Concrete Cases of the Functional Calculations 


With the solutions (3.10)-(3.12) at hand we expound a number of cases of 
interest in applications. 


A. STANDARD BROWNIAN MOTION 


Let {X(t), t > 0} be standard Brownian motion whose infinitesimal parameters 
are u(x) = 0, o?(x) = 1. Obviously we may take 


sx) = exp|—2 | MO a} 24 


and for the scale measure S(x) = x, so that u(x), the probability of reaching b 
prior to a with initial state x, is 


x—a 
b—a’ 
verifying the same result obtained by other means in Chapter 7. The speed 
density (cf. (3.7)) is 


u(x) = 


a<x<b, (4.1) 


1 
SoS 
m(¢) GG 
and the Green function (3.15) for the interval [a, b] is 
oe, a<x<é<b, 
—a 
G(x, č) = (4.2) 
2(¢ — a)(b — x) 
t-d , <é€<x<b 
Then a direct calculation from (3.12) gives 
u(x) = E[T,,,|X(0) = x] = (x — a)(b — x), a<x<b. (4.3) 


B. BROWNIAN MOTION WITH DRIFT 


If {X(t), t > 0} is Brownian motion with nonzero drift u(x) = p-afid variance 
o*, then 


s(x) = exp(—2ux/07), 
S(x) = Aexp(—2ux/o”) + B (A and B constants), 
and 


eT 2.x? _ e7 2Ha/a? 


U(x) = aubia® L y Tala? » 


which also was obtained in Chapter 7. 
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C. THE WRIGHT-FISHER DIFFUSION MODEL FOR GENE FREQUENCY 


Wright’s model for the fluctuation of gene frequency under random repro- 
duction was described in Example G of Section 2, Chapter 2. A more elaborate 
version involving mutation and selection pressures was set forth in Section 2 of 
this chapter, where a variety of approximating diffusion processes were 
highlighted. 

The simplest Wright—Fisher diffusion (a model for depicting fluctuations of 
gene frequency of the A-type within a population having both A- and a-types 
subject to selection influences) has the state space [0,1] and diffusion 
coefficients 


u(x) = ox(1 — x), o*(x) = x(1 — x). (4.4) 


Recall that the state variable X(t) is the fraction of the A-type in a population 
of N individuals, where t = 1 corresponds to about N generations. 


(a) o = 0 (No Selection Differences between the A-type and the a-type) 


The boundary states 0 and 1 are absorbing points and signify that the 
population exhibits no A-genes or all A-genes. Since u(x) = 0, then s(x) = 1 for 
0 <x <1 and S(x) = x. Hence, for any a, b, where0 <a <b <1, 


u(x) = =— for 0O<a<x<b<l, (4.5) 


gives the probability of reaching a fraction b of A-types before reaching a 
fraction a when the initial A frequency is x. It is intuitive and readily justified 
that this formula holds in the limit as a | 0 and b fî 1. Therefore, if X(0) = x, the 
probability of ultimate fixation into a population including only A-types is x, 
while fixation for the alternative type occurs with probability 1 — x. 

In the same way, in this particular example we shall obtain a valid result for 
the mean time to fixation if we compute v(x) using a = 0 and b = 1 provided 
we use the form in Eq. (3.12). We have 


ax(1— 6) for O<x<é <1, 
Beye) E79 
l 20 = x) for 0<Eé<x<l, 
(1 — é) 


and 
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Hence 


v(x) =] G(x, č) dé 


0 


= | Fa — x) : dé +| x20] 
= 0 1— a x é 
= —2[(1 — x) log(1 — x) + x log x]. (4.6) 


The maximum occurs at x = 4, and we have then v(3) = 2In2. As mentioned 
earlier, since one unit of time corresponds to N generations, the value of (4.6) 
in terms of generations should be multiplied by N. 


Expected local time. Tt is of interest to calculate the expected time, before 
fixation, that the A frequency has values in (x,,x2) where the initial A 
frequency was x. This is equivalent to evaluating 


w(x) = a| [axo dī|X(0) = x] (4.7) 


0 
for the function 


ofi for x; < É< xX, 
ats) = h otherwise, 


and T represents the time to some fixation. Following the pracedure used in 
Remark 3.3, we have 


w(x) = | ” ale, O dé, 


Xi 


which reduces to 


2x In ~2 for x <x, 
Xi 
w(x) = afa = x) n+) + n E) for x; < x < x3 
1-x x 
1-x, <— 
(1 — x) In for x > x. 


The measure function G(x, €)dé can thereby be interpreted as the expected 
time the process spends in (č, č + dé) prior to fixation starting from X(0) = x. 
(See Remark 3.3.) 

(b) o > 0 (Selection Favors the A-type over the a-type) 

Recall from (4.4) that in this case 


mé)=oG(l-—€) and = (G) = E(1 — &). 


It follows, referring to (3.5), that 


s(x) = exp| — [AFP a | = eo 20%, 


and therefore 


1 
Sia ae (4.8) 


is the probability of fixation of the population in the A-type where X(0) = x is 
the initial fraction of A-individuals. In the usual population genetics literature 
on the subject, the scaling of the selection parameter o of (4.4) is taken as 
o = Ny, and then (4.8) becomes 

te e7 2Nyx 
Dae 
which for x = 1/N (that is, where the initial population consists of a single A- 
type and all others a-types) is approximately S(1/N) = 2y/(1 — e~?%”) (~2y if 
Ny is large). 


D. WRIGHT-FISHER MODEL WITH ONE-WAY MUTATION 


Suppose no mutations of a to A are allowed, while mutations of A to a occur 
at a rate « = y/N. The relevant approximating diffusion is that of Case (a), 
Example F of Section 2 with diffusion coefficients 


u(x) = -yx and o%(x)=x(1—x) for 0<x <1. 


In this case we obtain 


x 2y 
s(x) = exp( +| ea) = exp[—2y In(1 — x)] = Gos" (4.9) 
The scale function for the interval (0, 1) can be taken to be 
= _ —2yt1 
Se, aye (4.10) 
1— 2y 
so that S(x) is increasing. Notice in all circumstances that 
a _ —2y+1 R iy —2y+1 
Ke S(x) — Sla) (1-— x) (1 — a) (4.11) 


~ S(b)— Sa) (1 = b)+! Sada 


is the probability of reaching b before a. For b > 1, then u(x) > 0 when 2y > I, 
while if 2y <1 the limit is positive. Thus, only if the mutation rate is 
sufficiently high is the endpoint 1 (signifying a state of only A-types) unattain- 
able. Of course, fixation of the A-gene is never possible, but where 2y < 1 the 
analysis at the boundary point | is more subtle. We shall deal with these kinds 
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of problems in the discussion and classification of boundary behavior for 
diffusion processes in Sections 6-8. 

From the biology of the model we should expect ultimate fixation of the 
a-type, and this indeed occurs. The time of fixation is then Tọ, the hitting 
time of state 0. As just seen, when 2y > 1, state 1 is unattainable and thus 
To = To,,. We may compute the expected time to fixation using 


E[T)|X(0)=x] = lim E[T,,|X(0) = x], 
al0O,bf1 
where T, , is the first time the level a or b is reached, 0 < a < b < 1. Following 


(3.7), we set m(é) = 1/€(1 — &s(€) = (1 — €)?’1/E. Now, according to (3.12), 
when 2y > 1 we obtain 


b 
E[T,,,|X(0) = x] -| Ga a(x, 6) d 


a 


b 
5 nuw | [S(b) — SEME) dë 


+ 2[1 — u(x)] [To — S(a)]m() dé, 


where u(x) is that appearing in (4.11), 
35 (1 ` yjor = (1 = re) ieee 
aý (1 -— b7? +i — tee ?+! 
| (e = b)727+1 a (1 at a (1 = eye 
x 


(1 = ease = (1 = x) 7t! 
+ | -5 Byer = (1 | 


: | (e T Ej Rt z2 (1 ay e2 a Ke Eyra 3 
2y- 1 


dé 


é. 
(4.12) 


Letting a decrease to 0 and b increase to 1 and noting that, when 2y > 1, then 
(1 -- b)7??*! > œ and correspondingly u(x) > 0, we obtain — 


E[T)|X(0) =x] = lim E[T,,,|X(0) = x] 
al0,bf1 


a (l—x) 271-1 me — é)! 
= 2-1 ji Pn 


“OP a- Ott -1 
+ 2 i pe a (oe) dé 


x l 1 l = 2yot 
=f lg [Oo edna, (4.13) 


a 
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the last equation resulting by integration by parts of the second integral, where 
one of the boundary terms and the first integral.cancel. The above formula 
also works for 2y = 1. 

In the case 2y < 1, the state 1 can be attained prior to fixation, and so 
Ty # Tọ, with positive probability. The limit procedure of (4.13) yields the 
mean of T) ,; and not the mean of To. The mean fixation time 


vo(x) = E[To|X(0) = x] 


satisfies the same differential equation 407(x)v§(x) + u(x)v(x) = —1 for 
0 <x <1, and the boundary condition v9(0) = 0 maintains. However, the 
boundary condition vo(1) = 0 is no longer correct and must be replaced by an 
appropriate alternative that correctly models the phenomenon being studied. 
This example highlights the importance of understanding boundary behavior, 
the subject of Sections 6-8. The analysis will be done in another manner. We 
seek a solution of 


x(1 — x) 
2 


v(x) — yxvi(x) +1 = 0 (4.14) 


obviously obeying the boundary condition v (0) = 0. This requirement alone 
does not uniquely determine the solution. On intuitive grounds we should 
expect vo(x) to be monotone increasing, and so we add this condition. The 
general solution of (4.14) has the form 


B x 2 1 (1 — n)?! a aoe 
vo(x) = | a)" p i in| dé + A(1 — x) + B, 


where A and B are arbitrary constants [A(1 — x)!7? + B is the general 
solution of the homogeneous equation é(1 — éy" (E) — yéy (€E) = 0]. The con- 
dition vo(0) = 0 implies A = — B. The fact that v(x) is increasing near 1 means 
that A < 0. With these conditions fulfilled we write 


E x 2 11 —n)??"! S 7 ae 
Vo(x) = a zl a dn dé + B[1 — (1 — x} 72], B20. 


The constant B is forced to be zero by imposing either of the following two 
constraints: 
(i) vo(x)is the smallest positive monotone solution satisfying vo(0) = 0, or 
(ii) vo(0) = 0 and v (1) < œ. 


An intuitive argument shows that v (1) =1/y. Because u(1)=—y while 
o7(1) = 0, the motion at x = 1 is essentially deterministic, and given X(f) = 1, 
then X(t + h) = 1 — yh. Then 


vo(1) = h + E[vo( X(t + h| X(t) = 1] = h + vo(1 — yh) + olh) 
= h + vo(l) — yhoo(1) + olh). 
Subtracting vo(1), dividing by h, and sending h to zero shows vo(1) ~ 1/y. 
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The nub of the above discussion is that where 2y <1 some further 
stipulations are necessary to extract the proper solution of (4.14) correspond- 
ing to ELT |X(0) = x] = v(x). We emphasize again that the last formula is 
not uniquely defined unless the behavior of the diffusion at the state point 
x = 1 is made more explicit. 


E. A CASH INVENTORY MODEL 


Let Z(t) be the amount of cash an organization has on hand at time t. We 
suppose that in the absence of intervention {Z(t), t > 0} behaves as a Brownian 
motion process with zero drift and variance parameter o° = 1. 

Holding cash involves an opportunity cost since this cash could be 
invested. We therefore suppose that holding cash at level z incurs costs at the 
rate cz. Since transactions into and out of cash are also costly, we include a 
cost K for each transaction. 

Consider the following (s, S)-type policy for controlling the cash level: “If 
the cash reaches level S, invest S — s and reduce the cash level to s. This 
transaction incurs a cost of K. If the cash ever dips to zero, sell investments, 
and bring the cash level up to s. This again costs K.” 

Consider a cycle to be from one intervention returning the cash level to s to 
the next such intervention. The long-run cost per unit time will be the expected 
cost per cycle divided by the expected cycle time, or 


(K + A)/B, 
T 

A= “| | cZ(t) dt|Z(0) = s] 
0 


B = E[T|Z(0) = s]. 


where 


and 


[The above formulas can be derived by direct renewal arguments (cf. Chapter 
5).] Here 
T = min{t > 0: Z(t) = S or Z(t) = 0}. 
From Eq. (4.3) in the Brownian motion example we have (using a = 0 and 


b=S) 
_B= wos) = s(S — 8). 


Applying (3.14) with g(z) = cz, and taking G(x, č) appropriate for the interval 
(a, b) = (0, S) (as given by (4.2)), we obtain 


T S 
el | g(Z(t) dt|Z(0) = x| = w(x) -| G(x, €)g(6) dé = c(GxS? — 5x°), * 
0 0 
and in particular 

A = w(s) = 4ces(S? — s?). 
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The average cost is 
K+A K + 4cs(S? — s’) K cS +s) 


B C-y  «S=)° 3 


To minimize, change variables, letting S = S — s. The average cost is 


~ K  cS+4+2s 
ass= 5 CED, 


which we differentiate to get 


ôC Bee and ôC Be ace 

ôs Ss? 3 OS gS? 3 
We equate the derivatives to zero to obtain the cost minimizing S = S* and 
s=s*: 


x 3K = 3K 
(tS =>, SH =, (4.15) 
c c 
whence 
a i 
gh (4.16) 
or 
s* = 4(S* — s*) or s* = 48*, (4.17) 
We substitute (4.16) in (4.15) to get 
3K 
*)3 TO 
a. 4c ` 


The optimal control parameters are 


1/3 
st= c5) and * = 35*, 


F. ABROWNIAN MOTION CONTROL PROBLEM 


Let X(t) be the state of some system which evolves stochastically from an 
initial position X(0) = 0. While the system is evolving, costs are accrued at a 
rate proportional to the square of the state with proportionality constant c. To 
avoid these costs, the observer may, at his will, pay a fixed cost K and then 
restart the process at zero. Thus, if T is the time of restart, then the total 
operating and restart costs up to time T are W(T) = K + f$ cLX(s)]? ds. 

We are going to assume that {X(t), t > 0} is a Brownian motion process, at 
least until restarted, that the drift parameter is zero, and that the variance 
parameter is o° = 1. We shall restart at times 


T = min{t > 0:|X(H| = A} 
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and try to choose an optimal 2. Our long-run cost per unit time is the expected 
cost over a cycle divided by the expected cycle time. Thus we need to compute 
A= E[W(T)|X(0)=0] and B=E[T|X(0)=0]. From Eq. (4.3) in the 
Brownian motion example we have, using a = —A and b = 4, 

B = (0) = 2°. 


Applying (3.11) with g(x) = cx?, we obtain 


a x 
w(x) = | us| (A — &)g(&) do + [1 — woo | (¢ + A)g(¢) ač 
x =A 


with 
x+A 


u(x) =—7- and g$) = c’. 


Since u(0) = 5, by straightforward integration and simplification we get 
A = w(0) = 3cA4. 
The average cost is 
K+A_K oi 
Boo # 6’ 
We differentiate with respect to 2 and equate to zero to obtain the minimizing 
cost for 2 = 4* satisfying 


cC) = 


2K cA* 


aro eae 


or 


1* = (6K/c)'*. 


5: The Nature of Backward and Forward Equations 
and Calculation of Stationary Measures 


Let {X(t), t > 0} be a regular time homogeneous diffusion process on the open 
interval I = (l, r). We designate by P(t, x, y) = Pr{X(t) < y| X(0) = x} the tran- 
sition distribution function of X(t) subject to the initial distribution 


1, if x<y, 


5.1 
0, if x>y, et) 


P(O, x, y) = l 

i.c., a point distribution concentrating at x. We shall assume throughout this 
chapter that P(t, x, y) derives from a continuous density on (l, r), namely, 

a P(t, x,y) 


ee P(t, x, y) for t>0. (5.2) 
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(The existence of a continuous density in (5.2) is not an assumption for a 
regular diffusion which has smooth infinitesimal coefficients. The analytic 
validation of this property is beyond the level of this text.) 


KOLMOGOROV BACKWARD DIFFERENTIAL EQUATION 


Our next objective in the spirit of Section 3 is to derive a partial differential 
equation for the function 


u(t, x) = E[g(X(t))| X0) = x], (5.3) 


where g(x) is bounded and piecewise continuous on I. 
Under mild conditions (those of Section 1 suffice) we will ascertain that 
u(t, x) satisfies the partial differential equation 


ôu 1 , du Ou 
Be 3 O) aa + U(x) a (5.4) 
with the initial condition u(0+, x) = g(x), where u(0+, x) = lim, 9 u(h, x). 
The specification g(n) = 1 for n < y and 0 for y > y, produces the transition 
distribution function 


u(t, x) = P(t, x, y). 


Equation (5.4) in this case is referred to as the Kolmogorov backward equation, 
that is, 


OP(G% y) 1 5, P(t x, y) P(t, x, y) 
a oe, Tee? 


applicable for t > 0 and l < x, y < r. The initial condition attendant to (5.5) is 


(5.5) 


1 if x<y, 


sa a ‘ if x>y 


(5.6) 
The transition density p(t,x,y) also satisfies the Kolmogorov backward 
equation: 

op 1 


6? ô 
i 5O OW) a5 + Mx) = (5.7) 


for t > 0 and x, y in (L r). 


Remark. It is not true that (5.5) and (5.7) always admit unique solutions, even 
if we require that P(t, x, y) be a distribution function in y for each t and x. As 
an example, the transition distribution functions for absorbing and reflecting 
Brownian motion both satisfy (5.5) but are manifestly unequal. The problem is 
that (5.5) makes no mention of the behavior of the process at the boundaries of 
the state space, exactly where absorbing and reflecting Brownian motion differ. 
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The assertions of (5.4)-(5.7) are surprisingly difficult to prove. In particular, 
it is rather formidable to prove that the functions P, p, and u are differentiable 
in t and twice differentiable in x. However, beginning from this point it is not 
hard to show that u(t, x) satisfies (5.4). 

We proceed to the proof of (5.4) assuming that u(t, x) is differentiable in t 
and possesses two continuous derivatives in x. Begin with 


Elg(X(h + t))|X(h) = x] = ult, yy 
or 
Elg(X(h + t))|X(h)] = u(t, X(h)) 
so that 
u(t + h, x) = E[g(X(h + t))|X(0) = x] = ELE[g(X(h + 1))|X(h)]|X(0) = x] 


(by the law of total probability) = E[u(t, X(h))|X(0) = x]. Subtracting u(t, x) 
from both sides and dividing by h yields 


u(t + h, sl cs + ELutt X(h)) — u(t, x) X(0) = x]. (8) 


Under our stipulation we know that u(t, x) is differentiable in t and possesses 
two continuous derivatives in x. (In most applications this assumption would 
be satisfied.) As h approaches 0 the left-hand side of (5.8) passes into @u/dt. The 
right-hand side is evaluated by implementing a Taylor series expansion. First, 
under the condition X(0) = x, let e > 0 be an arbitrary positive number, to be 
determined later, and let AX = X(h) — x and 
AX if |AX| < e, 
AX, = 

` h if JAX| >e. 
Since g is a bounded function, say, |g(x)|< A for all x, it follows that 
lu(t, X)| < A for all t and x and 

E[lu(t, x + AX) — u(t, x + AX,)[] < 2A Pr{[AX| > e| X0) = x}. 


The right-hand side is of order less than h as h > 0 (cf. (1.1) of Section 1). Thus, 
the right-hand side of (5.8) may be evaluated through 


+ Eue, X(h)) — u(t, x)| X(0) = x] = + ELult, x + AX,) 
— u(t, x)| X(0) = x] + ”. 


Expanding in a Taylor series, we get 


al 9? 
ult, x + AX) = ult, x) = AX, 5 (1x) + BAX) a(x +Z) (59) 
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where Z is a random variable satisfying |Z| < |AX,| < e. Let 6 > 0 be given. 
Since 0?u/dx? is continuous in x we may determine € sufficiently small to 
ensure that 


<6 (5.10) 


in (5.9). Divide by h, take expected values, let h > 0, and use (5.10) to conclude 


——|E[u(t, x + AX,) — u(t, x)|X(0) = x] ôu 1, ul i3 
im h — u(x) a 57 (x) E < 50°(x)d. 
(5.11) 


Since 6 > 0 is arbitrary, it follows that the limit in (5.11) is 0. Thus, if h > 0 in 
(5.8) we obtain 


ôu 1 , _ 3u ĉu 
as ae el 12 
aal Oa t HO) (5.12) 
for t > 0,1 <x <r. The appropriate initial condition is lim, u(t, x) = g(x). 
Now suppose g is the function 


_ji if <y, 
TE if E>y. 


Then u(t, x) = E[g(X(0))| X(0) = x] = Pr{X() < y|X(0) = x} = P(t,x, y) so 
that (5.12) would become the backward equation (5.5). However, the function 
g(-) just specified is not continuous as required in the preceding development. 
But g can be suitably approximated by such smooth functions, and through 
such approximations the argument leading to the backward equation can be 
justified with some effort. Equation (5.7) is obtained formally by differentiating 
(5.5) with respect to y. 

If the diffusion process { X(t), t > 0} is not assumed time homogeneous then 
(5.7) is modified as follows: Consider, for t > sand l < x <r, 


u(t, s, x) = E[g(X(t))| X(s) = x], (5.13) 


displaying its dependence on the initial time s and state x and current time t, 
The corresponding backward differential equation for u(t, s, x) is 


_ ult, sx) 1 


p 2u(t, s, x) ĉult, s, x) 
ôs 2 


ô 
*(x; )—aa + Ux, t) ax (5.14) 


coupled to the initial condition lim, ;, u(t; $, x) = g(x). 
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Example 1. Standard Brownian motion B(t). The associated backward differen- 
tial equation is the so-called heat equation 

dp 10p 

Bt Dax for t>0, -wo<x<o. (5.15) 
The unique transition probability density function satisfying (5.15) together 
with the appropriate initial condition at t = 0 is the Gauss kernel 


9 BN 
exp) | for t>0, -0 <x, y< œ. 


1 
Plt, X, y) KS. /2nt 
(5.16) 


The straightforward task of verifying that (5.16) satisfies (5.15) is left to the 
reader. 


Example 2. Brownian motion with drift W(t) = oB(t) + ut. Associated with the 
infinitesimal parameters u(x) = u and o7(x) = o? for —œ <x < œ is the 
backward equation 


for t>0, -w<x, y< œ. (5.17) 


The unique transition probability density function satisfying (5.17) together 
with the appropriate initial condition at t = 0 is 


p(t, x, y) = p(o7t, x + ut, y), (5.18) 


where @(t, x, y) is the Gauss kernel in (5.16). 
We will verify (5.18) satisfies (5.17) by using the chain rule for differen- 
tiation. Introduce the notation 


dp ao ôy 
= aes =, 5.19 
Px ôx $ Pxx ôx? $ and P: ôt ( 1 ) 
Then (5.15) becomes 
Pi = Bex: / (8.20) 
Differentiation of (5.18) then gives 
ô 
a T oo, + Ux, 
ôp V 40 p_ a 
Ege Dh. One = RQ Bee 


In conjunction with (5.20), then (5.17) is verified. 
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Example 3. The Ornstein-Uhlenbeck process V(t). The backward equation 


ð 1 ,é 0 
Bare gk agg. for t>0, -o<x, y<o (5.21) 


corresponds to the coefficients u(x) = —ax and o?(x) = g? for —c0 < x < œ. 
The unique solution as a transition probability density function is 


o a) 
p(t, x, y) = (5 (1 — e7?) xe”, ’); (5.22) 


where g(t, x, y) is the Gauss kernel given in (5.16). The verification that (5.22) 
satisfies (5.21) uses the chain rule for differentiation in a manner paralleling but 
more arduous than that used in Example 2. 

The Ornstein-Uhlenbeck process {V(t), t > 0} can be realized from stan- 
dard Brownian motion B(t) through the succinct representation 


V(t) = e™B [| (5.23) 


requiring a deterministic change of the time clock and a rescaling of the state 
variable. 

It is manifest that the process V(t) in (5.23) inherits only continuous 
sample paths from Brownian motion, and that V(t) is a Markov process. It 
remains to identify the infinitesimal parameters of V(t) as those of (5.21). To 
this end it is convenient to abbreviate t = o?(e?* — 1)/2a, and then 

ELV(t + h) — Vi)| V(t) = x] 


2at,2ah _ 
=e” fe [es (0 eS) — xe” | B(t) = xet, 


= ete = 1)xe* = —axh + o(h). (5.24) 


It is clear from (5.24) that the drift coefficient of V(t) is uy(x) = —ax. 
The calculation limy) (1/MNE[{V(t + h) — VOY IVE) =x] =o? is done 
similarly. 


The representation (5.23) shows that the Ornstein—Uhlenbeck process is a 
Gaussian process; i.e., the finite-dimensional distributions are multivariate 
normal with mean zero and covariance kernel 


ELVQV(s)] = o?e -9 LEY 


for s<t. 
2a 


The existence of a limiting distribution for V(oo) = lim,..,, V(t) (referring 
here to convergence in law) is clear from the representation (5.23) by noting 
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that V(t) is normally distributed with mean 0 and variance o?(1 — e~ *)/2a, 
which converges to o*/2a as t increases to oo. We highlight this fact. 


Proposition 5.1. For the Ornstein—Uhlenbeck process {V(t); t > 0} with drift 


coefficient u(x) = —ax, and diffusion coefficient o7(x) = 0°, 
lim Pr{V(t) < y} = Pr{V(co) < y}, 


where V(0o) is normally distributed with mean zero and variance o?/2a. 


THE FORWARD EQUATION 


We shall now derive a second partial differential equation satisfied by the 
transition density p(t, x, y) = dP(t,x, y)/dy. The equation can be regarded as 
dual to (5.70) and the pertinent variables are t and y (the state variable at time 
t rather than the initial state x). For this reason, among others, this equation is 
commonly called the forward or evolution equation. For versions of this 
equation with a discrete sample space we refer the reader to Chapter 14. The 
derivation of the forward equation is considerably more complex than that 
of the backward equation and often requires modifications in its statement. 
For the “formal adjoint” (dual) differential equation to (5.17) we write 


1 


ô d ô 
Fr CXD = 5 al HIN] — 5 O 625) 


We proceed with a heuristic derivation and subsequently we shall indicate 
some of the formidable problems encountered in making the analysis rigorous. 
Let ọ(t, y) be an arbitrary smooth function satisfying the identity 


olt +s, y) = [ou €)p(s, č, y) dé forall t,s > 0. (5.26) 


Differentiate both sides of (5.26) with respect to s and utilize the backward 
equation satisfied by p(s, č, y). This gives 


dp dp Op 
a! +s, y) = as Ë +s, y) = {ou Jae é, y) dě 
1 02 a 
= foe als a? (é) e + u(&) A dé. (5.27) 


Next, integration by parts, assuming that the contributions from the boundaries 
vanish, transforms (5.27) into 


0 a fo? ð 
Pasan [a o0- Zoo Masa 


; (5.28) 


a 
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Since p(s, č, y) approaches the delta (degenerate) measure concentrating at y, 
then as s > 0 relation (5.28) passes into ` 


Li y) = T we [ool y)] — F [aot y)], (5.29) 


which equals (5.25). In particular, the choice g(t, y) = p(t, x, y) certainly obeys 
(5.26) (it merely expresses the Markov property of the diffusion process) and 
consequently p(t, x, y) “satisfies” the forward equation (5.25). 

The analysis was quite loose and to justify the steps requires very stringent 
assumptions. Actually, in some cases p(t, x, y) does not satisfy (5.25), but further 
terms have to be added partly reflecting boundary behavior of the process. 
The full analysis of the forward equation is well beyond the scope of this 
introductory text. 


STATIONARY DISTRIBUTIONS 


If it exists, a stationary density w(x) necessarily satisfies 


Wy) = [vone x, y) dx forall t>0. (5.30) 


Mimicking the derivation of = we can deduce that /(y) satisfies 


=5 > TT To O] — 5 [ay WON (5.31) 


We should also expect by analogy with the fundamental limit theorem of 
Markov chains that the stationary density is approached to the extent that 


ne p(t, x, y) = Wy) (5.32) 


holds in some appropriate sense. Under strong conditions on the process, 
relation (5.32) obtains, but the development of such facts requires a more 
sophisticated context. If, indeed, (5.32) holds, then we can most likely pass to a 
limit in (5.29), with p(t, x, y) substituted for g(t, y), and since 

lim ôp(t, x, y)/ðt = 

t> 00 
is expected, we arrive once again at (5.31). If the convergence in (5.32) occurs 
boundedly then we can attempt to interchange the limit s > oo with the integral 
in the Chapman-Kolmogorov relation p(t + s, x, y) = f P(t, x, z)p(£ z, y) dz 
to obtain (5.30). In all circumstances we achieve (as s + 00) the inequality (by 
Fatov’s lemma of real analysis) 


Wy) = [vec z, y) dz. 
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The existence of the limit in (5.32) with W(y) representing a bona fide 
probability density on the state space (l, r) implies in particular that the process 
is strongly recurrent (positive ergodic) such that probability mass cannot 
escape to the boundaries. Actually, all the possibilities that arise in Markov 
chains, and even more pathologies, can occur in the case of a diffusion process. 


CALCULATION OF THE STATIONARY DISTRIBUTION 
Integrating Eq. (5.31) gives 
d | oy) 
dy| 2 


where C, is a constant. Multiplying by the integrating factor 


”[2 
s0) = exp} - | Ea a}, 
we can write (5.33) in the compact form d[s(y)o?(y)W(y)]/dy = C,s(y). Another 
integration, with S(x) = |” s(y) dy, gives 
S(x 1 
WO) = Cy OE 

= m(x)[C,S(x) + C3]. (5.34) 

The constants are determined to guarantee the constraints w(x) > 0 on (l, r) 


and jry(é)dé =1. If this is possible then a stationary density exists and 
otherwise not. 


wo] — MYY) = Cy, (5.33) 


Example 4. For the Ornstein—Uhlenbeck process (5.23), Eq. (5.34) with 
a?(x) = 07, s(x) = e*’ (y = a/g?) becomes 


Wx) = aar ag) + Ge’, (5.35) 


To insure that y(x) is positive for |x| large entails C, = 0. It follows that the 
unique stationary measure based on (5.34) is the normal density y(x) = ce~”*’, 
in agreement with Proposition 5.1. 


Example 5. Wright—Fisher frequency model with mutation [cf. Case (a), 
lxample F of Section 2]. Let X(t) be the fraction of the A-type at time t in a 
population of N individuals comprised of A- and a-types. Suppose the rates of 


a B 
mutation are A >a, a>A where Na —> y, NS >y. The approximating 
diffusion has infinitesimal parameters o7(x) = x(1 — x), u(x) = =x, + 
y (1 — x), and therefore 


x 2 a x 
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By (5.34) a set of possible stationary measures is determined from the formula 


X dx - E = = 
Wy) = cl | ace | — y) ty 4 C1 yty, 
(5.36) 


The possibilities depend now on the magnitudes of y, and y,. For either 2y; or 
2y, exceeding or equal to 1, the integral of the first term in (5.36) is not finite. 
The condition wy) > 0 requires that a = 0 or 1, and then integrability of Y(y) 
on I = (0, 1) compels that C, = 0. Thus, the stationary distribution is, with the 
correct normalizing constant, 


_ r(2y (273) 


= 1— yo ty? 71, 
Ty, + 2y) ) 


Wy) 


In the circumstance 


2y, <1 and 2y, < 1, (5.37) 


formula (5.36) provides a two-parameter family of stationary distributions and 
which distribution is appropriate (if any) requires further assumptions or 
information on the real process. Because the boundaries are both “attain- 
able” it is essential to spell out the influence of the boundaries on the sample 
realizations. This involves extra specifications beyond the effects of the basic 
infinitesimal parameters. With this in mind, in Sections 6-8 we elaborate 
on several important facets of diffusion theory including classifications and 
characterizations of boundary behavior. 


Example 6. Wright—Fisher diffusion with mutation and selection. [Compare to 
the conjunction of Cases (a) and (b), Example F of Section 2.] The infinitesimal 
parameters are p(x) = —y,x + yo(1 — x) + sx(1 — x), o?(x) = x(1 — x). The sta- 
tionary distribution derived from (5.34) for 2y, > 1 and 2y, > 1 now takes the 
form 


(1 a yP” ~1y272~ 1 easy 


fod = E)? 1272-1 east ge $ 


Wy) = 


THE BACKWARD EQUATION FOR THE KAC FUNCTIONAL 


Let {X(t), t > 0} be a diffusion and k(x) a nonnegative function defined on the 
state space I = (l, r). Consider the function 


w(x, t) = B expl- [kexen stax], (5.38) 


defined for g(x) bounded and continuous on J, 
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We shall develop the analog of the backward differential equation for the 
function w(x, t). To this end, we start with some useful rearrangements of 


t | 
exp| — [Wx (t)) ar |x (0) 


= exp] -f'ko (t)) ar epl —[ Wx (t)) ar fax (t) 


0 h 


= fowl - f'ko (t)) | — texo - frox (t)) ir lgx (t)) 
0 h 
+ epl — [wx (t)) ar ax ©). 


We next implement a Taylor expansion and take expectations to get 
t—h 
w(x, t) = r|- MoE exp [ k(X(t)) aba (t — » || 
o 


+ EEx] epl- k(X (T) arba (t — m) | + o(h). 
0 


Using the time homogeneity of the diffusion process, we have 
w(x, t) = E,[{1 — hk(x)}w(X(h), t — h)] + off). 


We next apply a Taylor expansion to w(-,t—h) about its state variable, 
yielding 


w(x, t) = ak = hia} ( wt t — h) + {X(h) — x} o, t—h) 


+ 4{X(h) — Ma, t— n)| + o(h) 


and then take expectations to get 


w(x, t) = [1 — ni ws t—h)+ on (x, t—h) 


0*w 
1,2 Phares = 
+ 507(x)h axe (x, t J + o(h) 
or 


w(x, t) — w(x, t — h) = —hk(x)w(x, t — h) + hla) E0, t—h) 


2 
+ hła?(x) a (x,t — h) + o(h). 


224 15. DIFFUSION PROCESSES 


Dividing by h and passing to the limit yields 
ôw ôw 1, 8w 
ay © t) = —k(x)w(x, t) + HX) BO t) + z (a7 Os t). (5.39) 


The necessary initial condition is w(x, 0) = g(x). 


P. LEVY'S ARCSINE LAW 

For a standard Brownian motion we construct the occupation time process 
@(t) = the time spent in the positive half line up to time t. (5.40) 

We claim that 


2 
Pr{@(t) < t|X(0) = 0} = z arcsin fe O<t<t (arcsine law) 


(5.41) 
whose density is /q(s) = 1/m,/s(t — s)for0 < s < t. 
In order to validate (5.41) we define 
_ SB if x >0, 
Koy = {F if x <0, 2) 
where $ > 0 is a free parameter at our disposal. Obviously 
t 
BOO) = | k(X(s)) ds. 
Consider ? 
t 
v(x, t) = E [e7’®®] = r exp} | k(X(s)) ish}. (5.43) 
0 


This is exactly (5.38) with g(x) =1. Accordingly, from the development of 
(5.39) we should expect that v(x, t) uniquely solves 


1 8?v(x, t) 

se AS 
ôt |1 u(x, t) 
2 ax? ’ 


x>0, t>0, 
(5.44) 


x<0, t>0, 
subject to the initial condition v(x, 0+) = 1 (=g(x)) and the continuity relations 
Ov Ov 
v(0+, t) = v(0-, t) and = (0+, t) = — (0—, t), t>0. (5.45) 
Ox Ox 


Observe that we expect continuity for v(x, t) and its derivative despite the 
discontinuity in the second derivative manifested in the differential equation 
(5.44). 
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E 


It is difficult to handle (5.44) directly and more convenient to work with the 
(Laplace transform) function 


V(x, a) = | e-*v(x, t)dt defined for A>. (5.46) 
o 


The corresponding differential equation for V(x,A) (obtained directly from 
(5.44) involving also an integration by parts) takes the form 


d À 
wiped OP tsb eae FSO 
2 Ox 
(5.47) 
1 0?V(x, A) 
AV, — 53 -1=0 forall x <0, 


subject to the obvious conditions that V(x, 4) is bounded for all real x, A > 0, 
and, owing to (5.45), , 


V(0+,4) = V(0—, å),  V'(0+, 14) = V'0-, A) (5.48) 


(the prime indicates differentiation with respect to x). Relations (5.47) on the 
positive and negative real line present ordinary differential equations which 
can be readily solved with insistence on a bounded solution, thereby yielding 


1 


——— + Ae™ (PRERE. x>0, 
A+B 

V(x, a) = í (5.49) 
T Bev??*, x <0. 


The continuity conditions (5.48) reduce to A+1/(A + B)=B+1/A and 
A./2(A + P) = —B./22. Their solution is 


l 
an ura VO ISNA 


and (5.50) 


1 
B 7 S +B- 1), 


and 


1 


1 
Taek AA T 


| e” *y(0, t) dt = V(O,A) = A 
0 
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Being a Laplace transform, the solution v(0, t) to (5.51) is unique, and thus, to 
establish that j 


t 
1 
v(0, t) = | —= e7” ds for >0 5:52 
(0, t) a = b (5.52) 


we need only to verify that (5.52) satisfies (5.51). To this end, 
L t 1 1/° 1 = 1 
| ame ash dt = | Jeet ent ar} ds 
I or s /t—s T o s s /t—s 


_ a{[ itera ash [ote ach 
T (Jo o 


1 
RAB 


Finally, (5.52) and the definition v(0, t) = E[e~*®| X(0) = x] = [he *fa(s) ds 
shows that the density for ® is 


1 


MT eg = — 


thereby completing the derivation of (5.41). 


for O<s<t, 


6: Boundary Classification for Regular Diffusion Processes 


Let {X(t), t > 0} be a regular diffusion process on an interval I having left 
boundary l and right boundary r. For x in (l, r) we postulate the continuous 
infinitesimal drift and variance coefficients u(x) and o7(x) > 0, respectively. 

In this section we develop the modern classifications of possible behavior 
near the boundaries / and r. We concentrate on the left boundary 1, the right 
being entirely similar. The approach is to let a decrease to lin the quantities 


u(x) = ua (X) = Pr{T, < T,|X(0) = x}, l<a<x<b<r, (6.1) 
and 
v(x) = va (x) = E[T,,,|X(0) = x], l<a<x<b<r, (6.2) 


where T, is the hitting time to z and Tae = T, A T, = min{T,, Tp} 
Recall the scale function S(x) with explicit expression 


x č 
S(x) -Í s(¢)dč; s) = apf- [2u(n)/o7(n)] inh, (6.3) 


xo čo 


where x, and čo are arbitrary fixed points inside (l, r). As indicated in Section 
3, the particular choice of x9 and čo is of no relevance. 
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It simplifies the exposition if we introduce the scale measure, the function 
S[J] of closed intervals J = [c, d] c (l, r) defined by 


S[J] = Sle, d] = S(d) — S(o). 


We denote both the scale function and scale measure by the same symbol S; 
no confusion results. 

We freely use the scale measure dS(x) = S[dx] of an infinitesimal interval 
[x, x + dx] with S[dx] = S(x + dx) — S(x) = dS(x) = s(x) dx. For example, we 
evaluate f? f(x) dS(x) using the usual integral f? f(x)s(x) dx, for, say, piecewise 
continuous functions f(x). 

The reader should check that 0 < S[c, d] < œ for l < c < d < r, and that 


S[c, d] = S[c, x] + S[x, d] for l<c<x<d«<r. (6.4) 
Similarly we introduce the speed measure M induced by the speed density 
m(x) = 1/[6?(x)s(x)], where 


d 


M[J] = M[c, d] = | m(x) dx, J = [c,d] < (Lr). 


c 


Again, M[J] is positive and finite for J = [c,d] < (l, r). 
In terms of the scale and speed measures, (6.1) and (6.2) are written 
u(x) = uz (x) = S[a,x]/S[a,b], l<a<x<b<r, (6.5) 


and 


U(X) = va (x) = afu fs [n, b] dM(n) + [1 — won| S[a, n] amon}. 
(6.6) 


It follows from the nonnegativity of the measure S and (6.4) that S[a, b] is 
monotonic in a for fixed b and that therefore we may define S(l, b] < 00 by 


S(l, b] =limS[a,b] < ©, I<b<r. (6.7) 


aļl 
If [a,b] < (l, r), then 0 < S[a, b] < œ. As an easy consequence of this and 
(6.4) we have 
S(I, b] = œ for some b in (I, r) 
if and only if 
S(l, b] = œ for all b in (l r). (6.8) 
Please observe the judicious use of parentheses and brackets. We write 
S(,b] (and not S[/,b]) to emphasize the definition as a limit. The value of 
S(1, b] depends only on the process parameters in the interior of the state space 


and, indeed, whether or not the boundary point / is included among the 
possible states is immaterial. 
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We turn to the hitting time of l. For each sample path starting at X(0) = x 
in (a, b) the hitting time T, is a monotonically nonincreasing function of a. It 
follows that we may define the random time T,, = lim,), Ta < 00. We now 
show that Tj, = Tj, the hitting time to l. When X(0) = x in (a,b), certainly 
T, < T, whence T, = lim, Ta < T. If Tj, = œ, then T, = Tj, = 00, so sup- 
pose T,, < œ. Because the paths are continuous, 

X(T.) = lim X(T) = lima = l > — œ 
all all 
when 7;, < co,and thus 7,, > T, = inf{t > 0, X(t) = 1}. We have shown both 
Ti < Tand 7}, = T,. Thus T} = T < œ. 

Note that T, is defined even when / is not a possible state (and then 
T, = œ). 

The following result ensues directly from (6.5) and the preceding 
discussion. 


Lemma 6.1. (i) Suppose S(l, x] < œ for some Xo in (l, r). Then 


Pr{T,, < IXO) = x} > 0 foralll <x <b <r. 


(ii) Suppose S(l, Xo] = œ for some Xp in (l, r). Then 
Pr{T,, < T,|X(0) = x} = 0foralll <x <b <r. 
Remark 6.1. For distinct points a,b in (l,r), the equality T, = T, cannot 


occur, since a continuous path cannot be simultaneously in two distinct places. 
But T, = T, can occur, provided both are infinite. Brownian motion with 


negative drift to l = — œ provides a simple example. For those paths starting 
at x < b which. never reach b, then T, = co by convention, while the drift to 
l = — œ implies T}, = œ% in a possibly different sense. 


In view of Lemma 6.1 we have the following definition. 


Definition 6.1 The boundary lis attracting if S(l, x] < œ and this criterion 
applies independently of x in (l, r). 


Example 1. For a standard Brownian motion, l = — œ is not attracting since 
S[a, x] = x — a (up toa fixed multiplicative constant) and S(— œ, x] = œ. On 
the other hand, suppose the process has drift u = —« < 0., Then S[a, x] 
= e? — 9244 _, 22% < «cy as a+—co. Thus 1=—oo is attracting for a 
Brownian motion with negative drift. 


We learn from this example that an attracting boundary need not be in the 
state space of the process. The example when u = —a < 0 also shows that the 
probability of reaching an attracting boundary in finite time may be zero. 
(Lemma 6.2 will refine this conclusion.) 
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While | = — œ is attracting for a Brownian motion with negative drift, it is 
not attainable in finite expected time. This leads to the next refinement in the 
study of boundary behavior. 


WHEN IS A BOUNDARY ATTAINABLE IN FINITE EXPECTED TIME? 


Consider an attracting boundary | wherein S(I, x] < oo for all x in (l, r). As 
affirmed by Lemma 6.1, the boundary | can be reached prior to reaching an 
arbitrary state b with positive probability from any interior starting point 
x < b, although not necessarily in finite time. 
We next evaluate, with b fixed, b <r, 
lim E,.[T, ^ T,], x<b<r, 
all 
the expected first passage time for the attainment of the boundary / or the level 
b commeneing from x. (We use freely, as in Section 5, the notation E, to 
indicate the expectation corresponding to the initial state x.) Recall that 
T, A T, denotes the time of first hitting the level a or b. 
Referring to (6.5) and (6.6), we see that 


. . 28[a,x] f? 
lim E,[T, ^ T] = lim 

ajl L 0] ait S[a,b] Jx 
2S[x, b] {* 
S[a,b] Ja 
Since l is attracting by assumption, it is obvious that lim,,,(S[a, x]/S[a, b]) 


is finite and positive, and accordingly the first right-hand term above is finite. 
Similarly, lim, ,,(S[Lx, b]/S[a, b]) is finite and positive. It follows that 


S[¢, b] dM(¢) 


S[a, č] dM(¢). 


a 


+ lim 
\l 


lim E,.[T, A T,] < 0 if and only if i | S[a, č] dM(é) < œ. 
aļl a\l 


á (6.9) 
Thusly motivated we define (employing some obvious interchanges of order of 
integration) 


x x x č 
X()) = lim | S[a, č] dM(&) = | S(l, č] dM(¢) -( íf s(n) anh dé 


a 


= | [f m(¢) aglan dn -Í M[n, x] dS(n). (6.10) 
Gi 
Notice that we have introduced the notation X(/, x) = X(/) to represent the 
above double integral. It depends on x but in later considerations only 
whether its value is finite or infinite is relevant and we can therefore suppress 
the dependence on x without ambiguity. (See the proof of Lemma 6.2, for 
example.) 
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Expressed in terms of X(/) we have the following dichotomy. 


Definition 6.2. The boundary lis said to be 


(i) attainable if X(D) < œ, 
(ii) unattainable if X(l) = œ. 


An elementary argument shows that S(l, x] < oo whenever X(/) < œ, and 
hence, if l is attainable, then / is attracting. An unattainable boundary may or 
may not be attracting. 


Example 2. Consider a Brownian motion with negative drift. Then | = — co 
is attracting but unattainable since the mean time to reach l or b from any state 
x < bis infinite. 


Example 3. Consider an absorbing Brownian motion on the state space 
[0, 00) (Example B of Section 2) where / = 0 is an absorbing state. Then / is 
attracting and attainable since E[T) ^ T,|X(0) = x] = x(b — x) < œ for 
0<x <b. However, E[T,|X(0) = x] = œ. Thus the mean time to reach an 
attainable boundary may be infinite. 

When a boundary / is attainable, then those realizations for which it is 
reached have finite expected time when limited by the alternative that a 
prescribed state b is reached first. Lemma 6.2, which follows, shows that an 
attainable boundary can be reached in finite time with positive probability. 
(Compare with Example 1.) The expected time to reach an unattainable 
boundary is always infinite. 


Lemma 6.2. Let | be an attracting boundary and suppose l < x < b < r. Then 
the following are equivalent: 


(i) Pr{T, < œ|X(0)= x} > 0; 
(ii) ELT, ^ T,|X(0) = x] < 0; 
(ii) E) = fF SCL n] dM(n) < ©. 


Proof. That (i) and (iii) are equivalent was shown in (6.9). To show that (ii) 
implies (i), assume that E[T, ^ T,|X(0) = x] < œ. Then T, a T, < œ (with 
probability 1) and, according to Remark 6.1, T, 4 T,. Since lis attainable, then 
Pr{T, < T,|X(0) = x} > 0, and thus Pr{T, < T,| X(0) = x} > 0. Finally, 
Pr{T, < 0|X(O) = x} > Pr{T, < T, < ©|X(0) = x} > 0. 
The last part of the proof is to show that (i) implies (iii). Suppose that 
Pr{T, < | X(0) = x} > 0. Then there exists £ > 0 for which 


Pr{T, < t|X(O) = x} =a>0, 
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Every path starting at x and reaching l prior to time t visits every intervening 
state č in (l, x], and then travels from ¢ to |. We formalize this observation by 
writing T, = T; + Ti (é), where T; (6) is the duration from the first visit to ¢ 
to the first visit to |. Then T, > Ti (é), whence 


0 <æ < Pr{T (E < t} = Pr{T, < t| X(0) = ¢} 


< Pr{T, ^ T, < t|X(0) = 3, for any € in (l, x]. 
It follows that 
sup Pr{T, A T, > t| X0) = 6} <1-a<l, 
éin (1,x] 
and, then by induction using the Markov property, that 
sup Pr{T, A T, > nt|X(0) = 6} < (1 — o)" for n21. 
čin(l,x] 
From the latter expression, we derive E-[T, A T,] < t/x < œ. But, because 
(ii) and (iii) are equivalent, then ff S(l, n] dM(n) < oo. 
Finally, because fž S(I, 7] dM(n) < œ, then 


x č x 
| S(l, n] dM(n) -Í S(l, n] dM(n) + I S(l,n]dM(n) < ©. W 


Roughly speaking, X(/) measures the time it takes to reach the boundary 1 
or an alternative interior state b starting from an interior point x < b. We 
next introduce the quantities 

M(, x] = lim M[a, x] (6.11) 
all 
and 


N(l) = fs [n, x] dM(n) = [wa J] aS(¢). (6.12) 


Then M(l, x] measures the speed of the process near l and N(I) roughly 
measures the time it takes to reach an interior point x in (l, r) starting at the 
boundary I. 

The modern classification of boundary behavior is based on the values 
(actually on whether they are finite or infinite) of the four functionals S(J, x], 
x(), N(D, and M(I, x]. These criteria are not independent. Lemma 6.3 de- 
lineates their relationships. 


Lemma 6.3. The following relations hold between S(I,x], X(l), N(), and 
M(I, x]: 


(i) S(l,x] = œ implies £(I) = oo, 
(ii) E({) < œ implies S(l, x] < 0, 
(iii) M(l, x] = œ implies N(l) = œ, 
(iv) N(l) < œ implies M(l, x] < œ, 
(v) Z() + N(I) = S(l, x] MC, x]. 
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Proof. (i) S(I, x] = œ implies S(/, €] = oo for all € in (l, r) and hence 
X() = fF S(l, €]dM(é) = œ 


since M is a strictly positive measure. Statement (ii) is the contrapositive to (i). 
(iii) M(L x] = œ implies M(I, €] = oo for all € whence N(I) = f¥ M(I, €] dS(é) 
= œ since S is a strictly positive measure. Statement (iv) is the contrapositive to 
(iii). (v) Upon summing 


x()) = | “sil, 2]aM(e) 


and 


N() = | "SEE, x] aM(8, 


we obtain 
X() + N()) = fe é] + S[Š, x]} dM(%) 


= S(l, x] f dM(é) [by (6.4)] 
l 


= S(I,x]M( x]. @ 


Table 6.1 lists the 16 combinations of assignments of finite or infinite 
values to the four quantities S(1, x], E(D, N(J, and M(I, x]. Those that are ruled 
out by Lemma 6.3 are indicated by X with the appropriate reason noted. 

The six realizable combinations of boundary criteria values have been 
grouped and labeled differently by different authors. William Feller introduced 
the original classification, adhered to by most American probabilists. The 
Russian school uses a slightly different formalization. Both groupings have 
their merits and are now juxtaposed. 

Table 6.2 lists again the six possible combinations, rows (1), (6) (8), (11), 
(12), and (16) in Table 6.1 along with the terminology of both the Feller and 
the Russian classification schemes. 


(a) Regular Boundary 


A diffusion process can both enter and leave from a regular boundary. For the 
full characterization of the process, the behavior at the boundary must be 
specified, and this can be done by assigning a speed M[{I}] to the boundary / 
itself. The behavior can range from absorption (M[{/}] = œ) to reflection 
(M[{I}] = 0) as epitomized by absorbing and reflecting Brownian motion. 
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TABLE 6.1 All combinations of assignments of finite or infinite values to the four quantities 
Sd, x], XD), MC, x], and N(I) defined in (6.7) and (6.10)-(6.12), respectively 


Six] M(x] x() N() Possible? Reason 
(1) <0o <0 <0 <a Yes 
(2) < 00 <0o < 00 = X X) + N() = Sl, x] x ML, x] 
(3) <0 <0 =00 <0 X E + N(D) = SU, x] x M(L, x] 
(4) <% <0 =00 =00 X X() + N(D) = S(l, x] x M(x] 
(5) <% =00 <% <0 X N(I) < œ = Mil, x] < œ 
(6) <0 =0 <0 =0 Yes 
(7) <a = 00 =0 <0 X N()) < œ = Mil, x] < œ 
(8) <a =00 =00 =00 Yes 
(9) =00 <a <a <0 X X(l) < œ = S(l, x] < œ 
(10) =00 <a <0 =0 X X(l) < œ = S(l, x] < œ 
(11) =0 <% =0 <a Yes 
(12) =00 <0 =0 =0 Yes 
(13) = 00 =o <0 <a X X()) < œ = S(l, x] < œ 
(14) =00 =00 <0 =0 X X(l) < œ = S(l, x] < œ 
(15) =00 = 00 = 00 <a X N() < œ = M(l, x] < œ 
(16) =00 =00 , =0 = 00 Yes 


Behavior somewhat between absorption and reflection (0 < M[{I}] < œ) is 
the sticky barrier phenomenon, where a strictly positive duration is spent at 
the boundary. This duration at the boundary contains no interval, however, 
and is not describable in an elementary way.t 

Another possibility is to restart the process in the interior of the state space 
upon first attaining the regular boundary point. While such a process may be 
Markov it is not a diffusion in our strict sense because the paths are not 
continuous everywhere. However, the transition density of such a process will 
satisfy the backward differential equation, and indeed, while it is evolving in 
the interior of the state space such a process is indistinguishable from a 
diffusion and thus diffusion process techniques can be brought to bear in its 
study. 

To establish that a boundary point l is regular, it suffices to check that 
S(l, x] < œ and M(I, x] < oo. 


(b) Exit Boundary 
In Section 7 we shall show that at an exit boundary | we must have 


lim lim Pr{T, < t| X(0) = x} =0 forall t>0. 

bl! xl 
Thus, starting at l (i.e., where the initial point x approaches J) it is impossible 
to reach any interior state b no matter how near b is to l. This is tantamount 
to’ the assertion that once the boundary l is attained, no continuous sample 
path can be extricated from /. 


+ The time at the boundary is similar to a Cantor set of positive Lebesgue measure. 
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It is possible to preserve the Markov property by having the process 
sojourn at l for a random duration, necessarily following an exponential 
distribution if the Markov property is to be maintained, followed by a jump 
into the interior (J,r) of the state space according to a prescribed probability 
law. If such discontinuous trajectories are precluded, that is, if the Markov 
process is a diffusion, then the exit boundary / must be a trap state, i.e., an 
absorbing point. 

If l is an exit boundary, then 7), the exit time at /, is the limit T = lim,,, Ta 
for processes starting at X(0) in (l, r). Thus, for example, ELT, ^A T,| X(0) = x] 
= v, a(x) = lim, va, (X) = lim, „ELT, ^ T,|X(0) = x], and the types of calcu- 
lations of Section 4 find relevance here as well. 

To establish that a boundary l is exit, it suffices to show that X(J) < œ but 
M(I, x] = œ. 


(c) Entrance Boundary 


An entrance boundary cannot be reached from the interior of the state space, 
but it is possible, and in many applications quite natural, to consider processes 
that begin there. Such processes quickly move to the interior never to return to 
the entrance boundary. 

Consider, for the moment, a diffusion process on [I,r) for which l is an 
entrance boundary, and look at the problem of evaluating, for instance, 
w(l) = EL |3® g(X(s)) ds| X(0) = 1], where T(b) is the hitting time for b. Because | 
is inaccessible, 


T(b) = lim Ta)  T(b), 
all 


w(I) = lim lim w, (x) 


xll all 


T(a) ^ T(b) 
= lim lim E | g(X(s)) ds| X(0) = x], 
x}l all 0 
Thus w(x) may be evaluated by taking the appropriate limits in formula (3.11) 
of Section 3. 
To show that a boundary l is entrance it suffices to establish that 
S(1, x] = œ while N(I) < œ. 


(d) Natural (Feller) Boundary 


Such a diffusion process can neither reach in finite mean time nor be started 
from a natural boundary. Natural boundaries are omitted from the state 
space, so that, for example, if / is a natural boundary then the state space may 
be taken of the form (/, r) or (1, r]. 

To establish that a boundary I is natural in the Feller sense one needs to 
show that both X(/) = œ and N(I) = œ. 
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The analogous criteria for the upper boundary r are 


S[x,r) = lim S[x,b],  M[x, r) = lim M[x, b], 
btr btr 


X(r) = [es £1 dS() = | "SEn, t) dM(n), 


x 


and 


N(r) = | S[x, č] dM(¢) = | M[n, r) dS(n). 

It would be desirable to relate the boundary classification stipulated above 
with the usual notions of ergodicity, null recurrence, and the transientness of 
the process. When both boundary points represent natural boundaries, then 
although / and r are unattainable neither of the possibilities 


Pr} tim X(t) = 1|X(0) = 7 =i; 


t> 


Pr} tim x0 = |, lim X(t) = r| X(0) = woh =1 
t> œ t> 0 

is precluded. There may or may not exist a stationary measure approached by 
the probability distribution of X(t) as t > oo. Later examples will verify that all 
possibilities indeed arise. 

Where both boundaries are exit, the process is transient and absorption 
into one of the two boundaries occurs quite rapidly. It will be established later 
that a diffusion process displaying entrance boundaries at both ends l and r 
has a limiting stationary distribution. 

Prior to developing several equivalent characterizations of the boundary 
classification we examine the boundary behavior in the above formulation for 
a number of concrete diffusion models important in the physical and biological 
sciences. 


Example 4. Standard Brownian motion [(I,r) =(—0o, 00)]. Recall that 
dS(é) = s(ë) dé -has s(€) =1 and dM(é) = m(é) dé has m(é) = 1. An elementary 
computation reveals that 


x() = N() = œ, 


so that | = — œ is a natural boundary and the same is true for the right boundary 
r= œ. 

The detailed discussion of the one-dimensional Brownian motion process 
covered in Chapter 7 indicates that the + oo points are never attained although 
every finite point can be reached with probability one in finite time. Thus the 
process is recurrent. The first passage probability distribution associated with 
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an arbitrary level a commencing from the origin, displayed explicitly in 
formula (3.4) of Chapter 7, has an infinite first moment. 

Furthermore, in the development of the sample path behavior (consult 
especially Chapter 12) we found 


Pr} fim X(t) = +00, lim X(t) = -o} =1 
trg t> œ 


i.e., the trajectories of the process oscillate continuously between +00 and 
— œ infinitely often so that the process is actually null recurrent. 


Example 5. Ornstein-Uhlenbeck (O.U.) process [(l, r) = (— œ, œ), (č) = — ¢, 
o*(é) = 1]. We obtain 
SO = E5 eE, ME eE. 


In order to ascertain the character of the boundary + œ we need to estimate 


fo) č 7 œ n 
X(00) -| f amon) | dS(&) and N(o) -Í f aso | dM(n). 


For the case at hand, we have 


N(œ) = F (res a)” dn 


where x is for convenience taken large but fixed. Observe through exercising 
twice integration by parts 


ene æ] 23 eee o e7% 
| e eae =3{ z2če Dao aa -af — dé 


n ny > 


and therefore 


By choosing xX, large enough and employing the above asymptotic relation we 
deduce that N(co) = oo. It is simpler to verify that X(0o) = oo. 

The boundary classification asserts that both — œ and +œ are natural 
boundaries. However, recall (cf. Proposition 5.1) that the Ornstein—Uhlenbeck 
process is strongly ergodic, meaning there exists a unique normal stationary 
distribution to which the process converges as t > œ (cf. Section 5). Brownian 
motion (B.M.) and the Ornstein—Uhlenbeck (O.U.) process share the common 
state space, the real line with +o being natural boundaries in both cases. 
It is worthwhile to highlight one of the principal differences in the two 
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processes. The former process (B.M.) depicts a null recurrent sample path 
behavior while the latter process (O.U-) is positively ergodic, due to the 
restoring force directed toward the origin. 


Example 6. Bessel Process. The Bessel process {Y(t)} of parameter « > 0 is 
the one-dimensional diffusion process on [0,0o) having the infinitesimal 
coefficients 
a—1 

2x. 7 
The transition density is explicitly given by 


exp[—(x? + y’)/2t] 


B(x) = o7(x) = 1. (6.13) 


a- xy 
p(t, X, y) = t(xy) 722 y e-na) ’ t> 0, x,y > 0, 
(6.14) 
where I,(z) is the modified Bessel function 
foo) 2k+v 
Cm ee (6.15) 


Eo RIT (KR +v 41)" 


When « = n, a positive integer, then Y(t) is the radial part of an n-dimensional 
Brownian motion (see Example E of Section 2 and Chapter 7, Section 6). 
The scale and speed functions are readily computed to be 


s(n) = exp| -f'o = n= | =n %, 


č 
= l-a 
: gra if #2 
2-4 ss i 
In é if «= 2, 
m(n) = 9%". 


We examine the boundary point 0. Accordingly, 
1 1 1 1 
2(0) -| [Í m(n) an 40 dé -Í =(1 — ¢)¢! de 
o LJé o% 
FE au <o if a<2, 
-fie T if a22. 
1 $ 1 1 
N(0) -Í [Í s() ač Jn dn = =l (1 — n?) dy 
0 LJn — % Jo 


al vasi of <% if «> 0, 
lao E i =0 if ao=0, 
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To sum up, 
an entrance boundary for «>2, 


the boundary 0 is {a regular boundary for O<a<2, (6.17) 


an exit boundary for a=0. 


The infinite point +00 is a natural boundary for all specifications of g. 
(Check this.) The interpretation ascribed to (6.17) is the usual one. With « > 2, 
and commencing the Bessel process at a point arbitrarily near 0, almost all 
the trajectories move onto the positive axis and ultimately drift to +00. 
However, where the constant drift rate in the positive direction is not of 
sufficient strength to the extent that 0 < « < 2, then 0 can be attained in finite 
expected time. To delimit the process completely in this case it is essential to 
describe the probabilistic laws governing the path behavior whenever the 
boundary point 0 is reached. 

For a regular boundary a variety of boundary behavior can be prescribed 
in a consistent way, including the contingencies of complete absorption or 
reflecting, elastic or sticky barrier phenomena, and even the possibility of the 
particle (path), when attaining the boundary point, waiting there for an 
exponentially distributed duration followed by a jump into the interior of the 
state space according to a specified probability distribution function. In the 
latter event, the process only exhibits continuous sample paths over the 
interior of the state space. 

For «=0 the 0 point behaves as an exit boundary with resulting total 
absorption in finite time. 


Example 7. Boundary classification of a population growth model. The dif- 
fusion process described by 
I= [l r) =[0,00), (= pé, and (E =aë (a >0,8 > 0). 


is a continuous state analog of a branching process. We find 


dS(x) = e7 2Bxla dx, dM(x) = L andy, 


At the boundary 0 we get 
Ł(0) < œ, N(0) = œ 
and so 0 is an exit boundary. The infinite point is a natural boundary. Consult 


Elementary Problem 10 and Problem 9 for further insights into this model. 


Example 8. Wright-Fisher gene frequency model involving mutation pressures. 
The model in Case (a), Example F of Section 2 has parameters 


L= [i r]= [0,1], 6) = y1 — č)= 7,6 and 07(% = &(1 - č) 
' (71,72 > 0). 
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A direct calculation gives 


=D. * (272 27i 
— log s(x) -Í a3 dé -Í > =a 1 z) dé = log[x?”(1 — x)2”], 


x x 1 


The speed measure is 
dM(x) = x?” 11 — x)?™ 7h dx 
The description of the boundaries 0 and 1 depends on the range of y1, Y2 
determined by combination of 16 cases delimited by the inequalities 


yı = 9, 0<¥<% n=} >$, 


%=0 O<y2<% 2 =% > F 


We examine the character of the boundary 0. (The endpoint 1 is treated by 
similar means.) 


1/2 1/2 1/2 1/2 
X(0) = | (| am) dS(č) = | f a Oe ax| dS(é) 
0 č 0 č 


1/2 
| (c1? + cg)E- 272 dé 


0 


( signifies “is of the order”) where c, is a positive constant. 

The above quantity is finite iff 2y, <1. We deduce straightforwardly 
N(0) = f? (fz? dS) dM(€) is finite provided y, >O and infinite for y, = 0. It 
follows that 

is an exit boundary for y, = 0, 
0 4 is a regular boundary for 0<y, <4, (6.18) 


_Us an entrance boundary for y >4. 
In a similar manner, we find that 


is an exit boundary for y, = 0; 
1 4 is a regular boundary for 0<y <4, (6.19) 


is an entrance boundary for y > 4. 


The interpretation of the facts of (6.18) and (6.19) is as follows. When y, and 
Y, exceed or equal 4 (implying that 0 and 1 are entrance boundaries) then the 
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process attains a unique stationary distribution (the process is strongly 
ergodic) whose explicit density is 


Vr dE (272) 272-1 


1— x)”, 
Tony SSN eo) 


This stationary density function is also meaningful for the range 0 < y; <4 
and 0 < y, < $ but the boundary points of the process are now regular and it 
is essential to spell out the behavior of the process whenever 0 and/or 1 is 
attained. In most of the genetic applications of this mutation model the 
“appropriate” boundary condition to be imposed is that of a reflecting barrier. 
The rationale for this statement derives by examining more carefully the 
boundary behavior for the finite Wright-Fisher Markov chain approximating 
process to the above diffusion process. (On this point, see Example F of 
Section 2.) 

In the regular case it is sometimes meaningful to impose an exponentially 
distributed waiting time on the path behavior at the boundaries and after each 
sojourn at the boundary have the process continue from a random point of 
(0,1) whose distribution is (6.20). The correct stationary measure of such a 
modified process involves mass jumps at the boundaries 0 and 1 plus a density 
portion of the form (6.20). 

The case y; = y2 =0 is of course the model with absorbing boundaries 
depicting fluctuation of gene frequency in the presence of no mutation forces. 


ENTRANCE BOUNDARIES AND STATIONARY MEASURES 


For a one-dimensional regular diffusion where both ends constitute entrance 
boundaries we should expect that the process evolves (t — 00) to a unique 
limiting stationary distribution. The fact that the process X(t) is strongly 
ergodic (i.e., settles as t > œ to its stationary distribution) requires a more 
recondite analysis which we will not enter into. 

We pointed out that a stationary density can be formally calculated from 
(5.31) to yield 


W(x) = m(x)[C,S(x) + C2]. (6.21) 


Because / and r are entrance boundaries, then S(x), which is monotonic 
throughout (l, r), must increase to œ as xr, and must decrease to — oo as 
x | l. (Consult Table 6.2.) To maintain w(x) positive throughout (l, r), we must 
have C, = 0. Then C, is chosen to ensure that ff W(x) dx = 1. Thus, the unique 
stationary density is £ 


gar O) o ce oe, e e Ea ath 
MO) = (a) dé T aA fi LAEM dE eee 


242 15. DIFFUSION PROCESSES 


The denominator in (6.22) is finite because both boundaries are entrance. 
Indeed, the argument just cited requires only that S(l, x] = S[x, r) = œ while 
M(l, x] < co and M[x, r) < oo, and this holds for certain natural boundaries in 
addition to entrance boundaries. (Consult Table 6.2.) 


7: Some Further Characterizations of Boundary Behavior 


For the sake of variety, we focus in this section on the right boundary r, rather 
than on the left boundary point l, on which the developments of Section 6 are 
based. The relevant integrals of the boundary classification are 


b r r 
X(r) = im | S[n, b] dM(n) = | S[n, r) dM(n) = | M[x, ¢] dS(¢), (7.1) 


b 


N(r) = Aa | S[x, n] dM(n) = fs [x, n] dM(n) = fm Lé, r) dS(¢), (7.2) 


x x 


S[x, r) = lim S[x, b], (7.3) 
btr 
and 
M[x,r) = lim M[x, b]. (7.4) 
btr 


We recall a number of facts pertaining to first passage probabilities. Let T, 
denote the first hitting time of the level b. The random variable T, is well 
defined in an extended sense that allows infinite values by the prescription 


T, = 00 if X(t)#b forall t>0, 
= inf {t > 0: X(t) = b} otherwise. 
In some circumstances, for instance where certain boundaries are attainable 
and absorbing, T, can take the value oo with positive probability. 
Remark 7.1. For x < c, the random variables T, increase as cfr, and con- 
sequently we may define the limit random variable 


T,- = lim T, < œ. 
ctr 


We claim T,- = T, the hitting time to r. Certainly T. < T, since T. < T, for all 
c. Thus, if T, = œ, then T, = œ and T,- = T,. It remains to consider the case 
T,- < œ. Then X(T,_) = lim,;, X(T) = lim,;, ¢ = r, so that 


T- = T,= inf{t = 0: X(t) =r}. 
We have shown both T = T, and 7, =< T. Thus = T. 
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An important functional of these hitting time random variables is the 
expectation 


u(x) = E,[e~*7"], b<x <r, 


= E, exp — [axe as}| (7.5) 
0 


with g(u) =1 in the form as appears in Eq. (3.41). Of course, when T, = œ, 
then e~*7 = 0 and therefore u(x) = E,[e~*7", T, < co], the qualification under 
the expectation signifying that the contributions to the expectation occur only 
for the sample paths over which T, is finite. The arguments of Section 3 
indicate that u(x) satisfies the differential equation [cf. (3.42)] 


Lu(x) = 407(x)u"(x) + p(x)u(x) = au(x) (7.6) 


with the boundary condition u(b) = 1. 
It is convenient to write (7.6) in its canonical form [see (3.9)]: 


d se aad 


Lu(x) = nee (x) = =<—~— ara = u(x). (7.7) 


24M dS’ 2m(x) dx | s(x) dx 
A one-dimensional diffusion process cannot go from state b to state d 
without visiting all intermediate points. Coupled with the strong Markov 


property, this fact leads to an important identity. 


Lemma 7.1. Consider points|<b<c<d<r. Then 


(i) E,[e~™] = E,[e~™]E,[e7 7] (7.8) 
and 
(ii) E,[e~ 74] = E,[e~*]E,[e" 7]. (7.9) 


Proof. Clearly, any sample path commencing at d and reaching b must pass 
through c first. This is expressed by the relationship T, = T, + T;'(c), accord- 
ing to which a sample path starting at d and reaching the level b first achieves 
c at time T,, and from c, first reaches b in the time length T, (c). The 
superscript + refers to the sample path after time T.. 

Invoking the law of total probabilities, conditioning on the bivariate 
random variable (X(T), T,), we have 


Egle] = Ele TATO] = E Ee T+T (X(T), TT 
and since T, is trivially determined conditioned on (X(T,), T}, 


= Kyle "Ee "OATI. 
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Appeal to the strong Markov property now allows us to discard the condition- 
ing clause and to reduce the inner expression to E,[e~ 7°]. That the process 
starts afresh at c conditioned on T, is exactly what the strong Markov property 
affirms. The combined result is (7.8). 

The verification of (7.9) is entirely similar. W 


For b <c the hitting time random variables T, increase as c increases. 
Therefore, E,[e~ 7], which is certainly bounded between 0 and 1, is decreasing 
as c increases to r and increasing as b increases. It follows that lim,,,E,[e7 7°] 
exists and is an increasing function of b, and consequently the double limit 
limp; lim,,,E,[e~*7*] exists. We now proceed to construct two relevant 
functionals of the boundary. We again take b < c <r and define, for some 
a>Q0, 


g(r) = lim lim E,[e~*7-] = lim Eple "T, (7.10) 
btr ctr btr 

y(r) = lim lim E,[e~27*]. (7.11) 
btr ctr 


Their interpretations and relevance for characterizing boundary behavior 
will be elaborated below. 


Lemma 7.2. Each of the quantities p(r) and y(r) is either 0 or 1. 


Proof. Consider w first and for definiteness take « =1.Letb<c<d<r. 

In the identity (7.8) we first let d Îr, then c Îr, and finally b Îr, in that 
order. With these operations (7.8) yields y(r) = [W(r)]* and this equation is 
only feasible provided y(r) equals zero or unity. 

In order to prove the result for g(r) we commence from identity (7.9). 
Letting dtr, c Îr, and then b Tr, in that order, yields g(r) = [e(r)]’, implying 
g(r) = Oor 1. This concludes the proof of the lemma. W 


We have assumed that « = 1 in the above; it is obvious that the same proof 
works for any « > 0. 


Remark 7.2. We pointed out earlier that if X(0) = b < c, then the sequence 
of random variables T, increases as c Îr and T, = lim,,, Te. We assert that 
g(r) = 1 if and only if Pr{T, < œ|X(0) = b} > 0. (7.12) 
When r is an attracting boundary, then S[b, r) < œ and 
Pr{T, < T,|X(0) = b} > Ofora<b<r. 


The statement that Pr{T, < o0|X(0) = b} > 0, now being affirmed as equivalent 
to ~(r) = 1, is a stronger statement. It corresponds to r being an attainable 
boundary. (Sec Examples 1-3 of Section 6 and Lemma 6.2.) Equivalent to 
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(7.12) is the assertion that g(r) = 0 if and only if Pr{T, < œ|X(0) = b} = 0. 
To show this, suppose b < c < r. Then 


lim E,[e~*7-] = lim(E,[e~*"*; T, = 00] + E,[e~*"; T, < c0]). (7.13) 
tr 


ctr c 


If Pr{T,_ < œ|X(0) = b} > 0, then the right-hand side of (7.13) is strictly 
positive. Since E,[e~*7"] < E,[e~*7"] for b< c< r, it follows that g(r) > 0, 
and hence g(r) = 1. If Pr{T, < œ|X(0) = b} = 0, then 


Pr{T, = oo | X(0) = b} = 1, 
and the right-hand side of (7.13) vanishes. Hence g(r) = 0. 


Remark 7.3. (r) = 1 if and only if lim,,, E,[e~*7"] > 0. This follows directly 
from its definition (7.11), and the relation E,-[e~*7"] < E,-[e~*"*] forb <c <r, 
in conjunction with Lemma 7.2. Note that in this case there exists M < œ such 
that lim,,,Pr{T, < M|X(0) =c} >0. In this sense, the process can escape 
from r into the interior. 

We already introduced the function (7.5) 


u(x) = E,f[e*?*], b<x<r, (7.14) 


which satisfies the differential equation (7.6). We shall also need the function 


v(x) = b<x<r. (7.15) 


sete 
Ele T 
It is also correct that v(x) satisfies the differential equation 

4a7(x)v"(x) + w(x)v'(x) = x(x) b<x <r, (7.16) 


together with the boundary condition v(b)= 1. Indeed, let b<x<c<r. 
Then, referring to (7.9) in Lemma 7.1, we have 


Eple] = E,[e~*™™]E,[e~*7-]. 
Hence 
1 _ Ele 7] > u(x) 
E,[e-*?=] Ele T] const ` 


Since v differs from u by a constant, (7.16) follows from (7.6). 


u(x) = 


The next three theorems relate the nature of the process behavior at the 
boundaries to the evaluations of the functionals (7.10) and (7.11). 


Theorem 7.1. (r) = 1 if and only if Xr) < oo. 


Note. X(r) < œ implies that r must be a regular or exit boundary. 
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Proof. Consider b < x <r. Assume X(r) < œ, which by Lemma 6.3(ii) im- 
plies that S[x, r) < œ. Now consider the differential equation for v defined in 
(7.16), which we write in the canonical form 


sd] ua] = av(x) dM(x). (7.17) 


Integrating (7.17) once yields du(y)/dS — dv(b)/dS = 2 f} av(€)dM(é). Integrating 
again gives 


dv(b) 


v(x) = v(b) = as” 


ES) — S(by] = 2 | j | | lë) am| dsm). (1.18) 
b b 


Noting that v(b) = 1 and that v(x) is increasing from its definition, we find 
that 


u(x) <1 + L2 [S(x) — S(b)] + 2av(x)2 a(r), (7.19) 


where the subscript b on È denotes the lower limit of the integration domain in 
(7.1). Now choose b close to r such that 2«,(r) < 4, which is possible since we 
have assumed X(r) < 00. With this choice of b, (7.19) reduces to 


dv(b) 


$u(x) < 1 + 1S 


[S(x) — S(b)]. (7.20) 


Letting x tr in Eq. (7.20), we deduce v(r) < œ since S(r) < œ. Comparing to 
(7.15) we have lim,,,£,[e~*7*] > 0. From this fact, paraphrasing the argu- 
ment of Remark 7.1, it follows that g(r) = 1. 

On the other hand, suppose (r) = oo. From the identity (7.18), since v(¢) is 
increasing, we obtain the lower estimate 


dv(b se at 
v(x) = 1+ a [S(x) — S(b)] + af [f ne) amie dS(n) 
b b 
> 1 + O Es) — SO + 2x0(B)E A(X). (7.21) 


Letting x > r implies v(r) = œ. Hence, lim,,, E,[e~*"*] = 0, or g(r) = 0. W 


Theorem 7.2. (i) Ifr is an entrance boundary, then W(r)=1. 
(ii) Ifr is an exit boundary or natural boundary, then y(r) = 0. 
(iii) Ifr is a regular boundary, then y(r) = 0 or 1. 


Proof. (i) Suppose r is an entrance boundary. In particular, we have that 
N(r) < œ. Let a <e <r, and recall Nr) = |" S[x.n]dM(n) = |" M[E,r) dS(é). 
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Now we use the property that u(x) of (7.5) satisfies Lu(x) = au(x) together with 
u(b) = 1, which again for convenience is displayed in the form 


d E u| = 2au(x) dM(x). 


Its first integral is 


d d c 
uo = E = f 2au(n) dM(n). (7.22) 


In order to proceed we need to establish the following: 
Claim. \im,,, du(c)/dS = 0. To argue for this assertion note that 


u(x) = E,[e~*?*],b < x <r, 


is monotone decreasing and u(x) = 0. Hence [1/s(x)] du/dx = du/dS is negative, 
while Lu = au implies 


d {du dJ 1 du 
dx Gs ~ dx E | = 2m(x)[ou(x)] > 0. 


The last inequality implies that du(x)/dS is increasing. Therefore, 


* du 
-1 < —u(b) = | — dS 
< u(x) — u(b) [3 


du(x) 


< 
~ dS 


[S(x) — S(b)] < 0. (7.23) 


The assumption that r is an entrance boundary entails 
S[b, r) = S(r) — S(b) = œ. 
(See Table 6.2.) Finally, sending x Î r in (7.23), we see that du(x)/dS > 0 as x Î r. 


This validates the claim 


Resuming the proof of the theorem, we let c Î r and use the just demonstrated 
fact that lim,,, du(c)/dS = 0 to reduce (7.22) to 


du(a) _ $ 
-45 2 [ao dM(x). 


Integrating this equation with respect to dS yields 


—u(y) + u(c) = \ 


c 


| | “Deut am| dS(a). (7.24) 


a 
Now let y Îr in (7.24). Since u(x) is decreasing, this gives 


u(r) + ule) < 2au(e)N.(r) < %0. (7.25) 
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Choose c close to r, fulfilling «N (r) < 4. Then u(r) > $u(c) > 0 and so u(r) > 0. 
Since u(r) = lim,,,, E,[e~*7"], it follows that y(r) = 1. 

(ii) Suppose r is an exit or natural boundary, so that N(r) = œ. Now 
integrating d[du(é)/dS] = 2au(€) dM(é) from x to a (x < a) produces 


du(a) a du(x) _ 
dS dS 


2a | E dM(6). (7.26) 


Integrating (7.26) from y to a with respect to dS(x) (y < a) yields 


du(a) 
dS 


a 
y 


[S(a) — S(y)] — [ula) — u(y)] = 2af [fo im) | dS(x). (7.27) 


Since it is clear that 0 < u(a), u(y) < 1, we infer 


du(a) 


1 SS 
Z-S 


[S(y) — S(a)] + 2| f u(č) am| dS(x). (7.28) 
y x 
Since S(y) is increasing, and y < a, S(y) — S(a) < 0. Also, we already noted 
that du(a)/dS < 0. Therefore 


1> 20 f | | «0 m| dS(x). 


By virtue of the property that u is a decreasing function, we deduce 


1 > 2ou(a) | | dM(é) dS(x) = 2ou(a)N(a). (7.29) 


y x 


Now let aftr. Then N(a)f N(r) = œ. In view of inequality (7.29) we must 
have lim,,,u(a) = u(r—) = 0, i.e., lim,,, E,[e~*7"] = 0. 

Now let b Îr, implying that y(r) = 0. 

(iii) This is simply a restatement of the indeterminancy of y(r) for a 
regular boundary. By Lemma 7.2, y(r) is either 0 or 1. 

The proof of Theorem 7.2 is complete. W 


For our final theorem supplementing the interpretation of the boundary 
classifications, we focus on the scale measure in a neighborhood of r. Recall 
that S(r) < œ is equivalent to the statement lim,,, Pr{T. < T,|X(0) = a} > 0, 
whereb<a<c<r. 

Assume that r is entrance or natural, which are the only cases in which the 
diffusion cannot approach r in finite time having started at a point a < r. Now 
we shall inquire whether this diffusion can approach r in infinite time or not; 
namely, when does Pr{lim,;,, X(t) = r| X(0) = a} > Ohold? The next theorem 
shows that S(r) < œ is a necessary and sufficient condition for this property. 
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Remark. The positive probability of the event that {lim,,,. X(t) = r} and the 
continuity of paths well justifies the term attracting associated with S(r) < oo. 


Theorem 7.3. If r is not regular then S(r) < œ if and only if Pr{lim,.,. X(t) 
= r|X(0) = a} > 0. 


Proof. Suppose S(r) < œ. Let a = dy < a, <a, <--: be a sequence a, >r. 
The event {lim,.,, X(t)=r} certainly contains the event {T,, < œ% and 
T®  <T®, for all n> 2} where T{” means the first passage time to x 
starting from a,. We can use the strong Markov property of the process to 
split this event into a sequence of independent events. In fact, 


Pr} im X(t) =r|X(0) = a} > Pr{T,, < 00|X(0) = a} 


| X(0) = a} 


an+1 aân-1 


x Pr{T,,., < T, 
2 


= 
H 


£ 5 — S(a,_ 
= Pr{T,, < 0|X(0) =a} [] ay 
n=2 n+l nil 


(7.30) 
Since S(r) < œ, we can determine the a, recursively close enough to r so that 
[S(a,) — S(a,—s)VES@) — Slan-1)] > 1- n7? 
Since S is an increasing function, this implies that 
ESla,) — Slan- 1)]/ESlan+ 1) — Slan- 1)] = 1 n7’. 
Therefore 
Priim X(t) = r| X(0) = a} > Pr{T,, < ©|X(0) = a} i (: — =) > 0. 


Suppose S(r) = œ. (This happens only when r is an entrance or natural 
boundary.) We shall prove that Pr{lim,;,. X(t) = r|X(0) = a} = 0. We fix a 
point c, with c < a < r. Ifc < x < r, then 


Pr{T. < œ|X(0) = x} = Pr{there exists a level b with T, < T,| X(0) = x} 


. . S(b) — S(x) 
= lim Pr{T. < T,| X(0) = x} = lim ———_ = 1, 
im {T, < T,| X(0) = x} m Esa 
(7.31) 
and therefore, : 
Pr{T, = œ| X(0) = x} =0. (7.32) 


Consider Pr{lim,,,, X() = r|X(0) =a}. A sample path with the property 
lim,,,,. X(t) = r requires for any ¢ <r that there exists a fy depending on the 
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path such that for all s > 0, X(t) + s) > c. Thus, the quantity (7.32) can be 
estimated above by " 


Prim X(t) = r| X(0) = a} < Pr{there is some t such that X(t + s) >c 
tt oo 
for all s > 0| X(0) = a} 
= lim Pr{X(t + s) > c for all s > 0| X(0) = a} 
tf oo 


= lim Pr{T} (t) = œ, c < X(t) < r|X(0) = a} 
tt oo 


(+ refers to the sample path after time t), and then the Markov property 
applies to show this equals 
lim E,[Pr{T. = 0|[X(t)};c < X(t) <r] =0 by (7.31). 


tt oo 


Thus, we have proved that S(r) = œ implies 
Prim X(t) = r| X(0) = al =0. 
t> œ 


This completes the proof of Theorem 7.3. 

The findings of Theorems 7.1-7.3 are highlighted in tabular form (Table 
7.1) to enhance and solidify the boundary classifications. 

Another approach to characterizing and discriminating boundary behavior 
is in terms of the semigroup of operators (11.3) and its associated infinitesimal 
operator with its domain of definition (see Section 11). We shall also develop 
there the nature of the infinitesimal operator as influenced by the boundary. 


TABLE 7.1 
Exit Entrance Natural (Feller) Regular 
x <a (ee) co <0 
N (oe) <a 00 <0 
s= fas <0 (ee) M,S, < ©; <a 
r and 
m = fam (oe) <0 at least one is 00 <a 
o 1 0 0 1 
y 0 1 0 Oor 1 
Pr} tm X(t) = r} >0 0 Oor >0 >0 
t>o 
Attain r in finite Cannot reach r 
expected time and from interior of 
get stuck there (hr). Ifat r, enter 


(l, r) rapidly 


8 BOUNDARY BEHAVIOR OF DIFFUSION PROCESSES 251 


8: Some Constructions of Boundary Behavior of Diffusion Processes 


There are two principal tactics in dealing with the behavior of diffusion 
processes at boundary points. An analytic delineation uses an appropriate 
second-order differential operator (the basic infinitesimal operator of the 
process; see Section 11) coupled with boundary conditions. The nature of these 
boundary constraints delimits the boundary classification. A second procedure 
operates by examining the local time process at the boundary. We emphasize 
several constructions founded on the second approach next. 

Boundary behaviors are combinations of five basic types: 


(i) absorbing barrier phenomenon, 
(ii) reflecting barrier action, 
(iii) elastic boundary structure, 
(iv) sticky boundary complex, 
(v) jump boundary behavior, and instantaneous return processes. 


A. TWO CLASSICAL EXAMPLES 


The prototypic process with an absorbing boundary is absorbing Brownian 
motion, where the origin of a standard Brownian motion is converted into an 
absorbing or trap state. More specifically, consider Brownian motion, 
{Z(t), t > 0}, where the state space is confined to the positive axis subject to 
the prescription that the sample paths fix at the zero position once it is 
attained. The explicit transition probability density for absorbing Brownian 
motion was displayed in Section 4 of Chapter 7. The infinitesimal parameters 
of the process, pertinent only to the restricted state space (0, 00), naturally 
coincide with those of standard Brownian motion. 

Reflecting Brownian motion {Y(t),t > 0} behaves as standard Brownian 
motion in the interior of its domain (0, 00). However, when it reaches its zero 
boundary, then the sample path returns to the interior in a manner reminiscent 
of that of a light wave reflecting off a mirror. Thus, if Brownian motion is 
defined on [0, œ) with a reflecting barrier, it can be viewed as the absolute 
value process Y(t) =|W(t)| of a standard Brownian motion W(t). The state 
space again consists of [1, r) = [0, 00), with 0 now acting as a regular boundary 
point. The sample paths are manifestly continuous [since those of W(t) are] 
and at the times when the origin is hit the observed motion displays immediate 
reflection to the positive axis in a continuous manner. 


Local Timet 


In order to construct processes depicting the boundary behavior of (iii) and fiv) 
it is useful to introduce the concept of local time. To this end, we first consider 


ft Some aspects of local time and inverse local time, with applications to generalized Bessel 
diffusion processes, appear in Section 12, 
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a simpler notion. For any given subset A (say an open interval) contained in 
the state space (l, r) let f 

1 for CEA 

0 for C¢A 


be the indicator function of the set A. The occupation time of the set A up to 
time t is defined by 


LO = | 


L,(t) = [uxo dt. (8.1) 


Because the sample paths of a diffusion process are continuous, this random 
variable is well defined as an ordinary integral along each process realization. 
It is an easy matter to calculate its successive moments. Specifically, for an 
initial state X(0) = x, we have Pr{X(t) € B| X(0) = x} = fs p(t, x, y)dy, where 
p(t, x, y) is the transition density function of the process. We obtain 


ELLO] = al f LX €) a| 
(0 


= feix (t))] dt 


= f f p(T, x, y) a dt. (8.2) 
oLJa 


The corresponding second-moment calculation produces 


E[L,(t)”] = E, f fi (XC) a) i fr AX cout 


= | | E,{L(X(d)L(X(u)] dr du 
ovo 


= FF | plu, x, C)p(t — u, C, y) dl J dudi: 
oJo AJA 


(An appeal to the Markov property justifies the last equation.) 

Returning to the concept of occupation time we specialize A = A(a, £) to be 
(a — £,a + £), an interval of length 2e centered at a. Consider the limit random 
variable 


Foul 
olt, a) = im Fy Laa, lt). (8.3) 


A deep and remarkable result is that 


q(t, a) defines a family of random variables, (8.4) 
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called the local time process, valid for t > 0, where a is an interior state point. 
In the case of Brownian motion it can be further established that ọ(t, a) is 
jointly continuous and nondecreasing in t for almost all sample paths. The 
modern development of these facts proceeds by the theory of stochastic 
integrals. We discuss a bit more the nature of ọ(t,a) and some facts on 
additive and multiplicative stochastic functionals in Section 12. 

The random function g(t, a) can be construed as the density of L,(#) in the 
sense that 


L(t) -Í g(t, a) da. (8.5) 
A 


The representation (8.5) is commonly used and furnishes a remarkable formula 
of substantial utility in many advanced theoretical deliberations on stochastic 
processes. 

We should like to emphasize a recondite but important fact. The local time 
g(t, a) is a density and is not the same quantity as the occupation time of a 
point, 


Lai = fixo dr (8.6) 


where 


1, u = a, 
Lalu) = T usa. 


The right-hand integral in (8.6) is usually identically zero. This happens, in 
particular, for the case of Brownian motion where the integral vanishes but the 
local time process g(t, a) can be defined such that @(t, a) is strictly increasing in 
the neighborhood of t = 0 for almost all sample paths starting from X(0) = a. 


Random Time Changes 


We wish to develop the concept of a random time change in order to perform 
constructions of diffusion processes exhibiting special boundary behavior. The 
concept of time change has wide ramifications far beyond the applications 
reviewed here. 

Let A(t) be a continuous increasing random function with 4(0) = 0. We 
sometimes indicate the dependence on the sample path w by writing A(t, w) 
rather than A(t). Corresponding to each trajectory œw let A7 '(s), s > 0, denote 
the inverse function of A(t), defined formally by 


A~'(s) = inf{f, AC) > s}. 


As an example consult Fig. 1. A constant piece for A(‘) induces a jump 
discontinuity in A` '(s). It can be checked routinely that A~ '(s) is increasing 
and right continuous (not necessarily continuous). It suffices to assume that 
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FIG. 1 


A(t) is right continuous and increasing and then A7}(s) retains the same 
properties. In this case a jump discontinuity of A(t) gives rise to a constant 
portion for A” 1(s). 

Let X(t) be a strong Markov process. A process A(t, œw) is called an additive 
functional associated with X(t) if 


A(t + 1,@) = A(t, @) + A(z, œ; ), (8.7) 


where œw denotes the sample path translated t time units. (Specifically, if œ is 
a sample path, then w,* refers to the same sample path but viewed over the time 
duration subsequent to t.) The function (8.1) provides an example of an additive 
increasing functional. We validate this assertion now. 

Consider 


t+t 


L(t +t, @) -Í 


0 


I ,(X(u, œ)) du = fuxa w)) du + | I4(X(u, @)) du. 
0 


t 


(8.8) 
Implementing an obvious change of variables gives 
ttr T 
| 1,(X(u)) du = | I,(X(u + t)) du. (8.9) 
t 0 
But the clear interpretations show that 
X(u+t) = X(u + t,o) = X(u, œ; ). (8.10) 


Inserting this relationship into (8.9) and then (8.8) we verify instantly that 
L(t, œ) satisfies (8.7). 
Another example of a continuous additive functional is 


G(t, œ) = G(t) = g(X(t)) — g(X(0)) 


where g is a continuous function defined on the state space. Further applications 
of additive functionals and associated martingales are developed in Section 12. 
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With the availability of an increasing continuous additive functional with 
respect to {X(t)}, designated by A(t) = A(t, w), we construct the new process 


Y(t) = X(A71(0) (8.11) 


It is a nontrivial fact that Y(t) determines a strong Markov process with right 
continuous sample paths. The proof of this assertion is intricate and well 
beyond the scope of this book. The above result will be used without further 
validation to enable us to construct diffusions displaying special boundary 
behavior. 


B. A DIFFUSION WITH AN ELASTIC BOUNDARY 


An elastic boundary manifests the properties of both a reflecting and an 
absorbing boundary. For example, the left endpoint / is an elastic boundary if 
it is a regular boundary which behaves as a reflecting boundary for a certain 
random duration after which the process is killed. 

We provide a concrete construction of an elastic boundary by confining 
attention to certain special paths of the standard Brownian motion process. 
Consider standard Brownian motion {B(t), t > 0} and fix a level —c < 0. Let 
T_, denote the first passage time to the point —c. That is, for a given sample 
path œ, T_(q@) indicates the time when the value —c is first attained. Formally, 


T_, = inf{t > 0, Bit) = —c}. 


The fact that T_, is finite with probability 1 was established in Chapter 7. A 
realization œw of a Brownian path until time T_, is depicted in Fig. 2. 

Now let W_,(t) = B(t)fort < T_,,and undefined fort > T_,. Thus W_,(t) is 
the Brownian motion killed upon first reaching —c. It remains a diffusion 
process. 

Construct now the additive increasing functional 


C(t) = C(t, œ) = | Tho, o(B(u)) du, (8.12) 

0 
which is exactly the occupation time of the positive axis. For any sample path œ, 
C(t) is certainly continuous and increasing but obviously strictly increasing 


only over the time durations where B(t) > 0, i.e., at the times when the process 
occupies the positive axis. 


Bit) 


HIG, 2 
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Manifestly, C(t) is constant during the excursions of B(t) to the negative 
axis. Consider now the inverse C~'(s). Notice that this function jumps 
upwards (displays a discontinuity) for the time periods when B(u) visits the 
negative axis. Now form the process 


Z(s) = W_{C~'(s)), 0 < s< C(T_.. (8.13) 


where W_ (t) is the Brownian motion process run only until the stopping time 
T_,. Under this convention the Z(s) process retains the strong Markov property. 
(It is tempting to define Z(s)=0 for s>C(T_,), but such a designation 
confounds this event with the events of contact with the zero point prior to 
time C(T_,), and the process no longer possesses the strong Markov property.) 

As stated immediately following (8.11), the Z(s) process is strong Markov. 
Moreover, as the time s > 0 proceeds, a jump will occur in the value of C7 '(s) 
exclusively where the B(u) trajectory sojourns on the negative axis. For the 
obvious reasons the process C7 !(s) is referred to as a random time change. In 
more picturesque language, the clock of C7 !(s) runs only during the excursion 
periods of B(u) on the positive axis and effectively ignores the time periods 
where B(u) traverses the negative axis. It is wise to consult the schematization 
shown in Fig. 3. Manifestly the Z(s) process terminates (is killed) at the 
moment ¢ equivalent to when the W(t) path crosses zero for the last time before 
reaching the level —c. The thin curve represents a trajectory of the Brownian 
process stopped at the hitting time of the level —c. The arcs of the trajectory 
where W(u) > 0 but prior to T_, remain intact since C~1(s) strictly increases 
during these time periods. The corresponding trajectory of Z(s) is precisely the 
thick portion of the curve. The Z process is killed after the time € = C~1(T,) in 
view of the fact that the Brownian path W then moves on the negative axis 
until it crosses the level —c. It is important to observe that at a jump point of 
C~\(s) the initial and final value of the Brownian path are both zero, i.e., 
W(C~\(s)) = 0 at these points. It follows in view of this comment that every 
path of Z(s) = W(C~ \(s)) is continuous. Indeed, where a path œ traverses the 
positive axis the Z trajectory coincides with its antecedent Brownian path, and 
where œ describes motion over the negative axis but prior to time T_, these 
time durations produce jumps in C7 '(s) with the consistent value 0 for Z. The 


W it) 


riq. 3 
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FIG. 4 


specific path of Z(s) emanating from the example of Fig. 3 is described in Fig. 
4. Thus the boundary zero acts as a reflecting barrier point part of the time 
such that, upon contact, instantaneous continuous return to the positive axis 
transpires. This possibility occurs a random number of times, but at some 
attainment of zero the process is killed. 

To sum up, a typical path of Z(s) ranges on the positive axis with a number 
of continuous reflections from the zero boundary until finally culminating with 
killing at the origin. 


C. A STICKY BOUNDARY 


Consider a diffusion process {X(t), t > 0} on the state space [l, r). A regular 
boundary / is said to be sticky if the occupation time of the process at l 


La(t) = fax (s)) ds (8.14) 


is positive for all t > 0, where X(0) = l. As pointed out earlier, the occupation 
time at a point is not the same as the local time. 

To set forth a concrete case of a sticky boundary, take Y(t) as reflecting 
Brownian motion, specifically Y(t) = | B(t)| where B(t) is standard Brownian 
motion. We take (t) = @,o,(t) to be the local time at the origin. [Sec. (8.4).] Of 
course, as indicated earlier, w(t) determines an additive increasing functional. 
We now form the new additive increasing process with respect to Y(t). 


U(t) = t + Ke(t) (8.15) 


where x is a fixed positive constant. Observe that U(t) increases continuously 
for each sample path from 0 to oo [since the g(t) process is continuous 
nondecreasing]. We also see that dU(t)/dt can differ from 1 only at the time 
points where a path contacts the origin. 

Using U~'(s) as a random time change superimposed on Y(t) we construct 
the new process 


ris) = ¥(U71(s)) (8.16) 


Obviously, F(s) inherits the state space [/,r) = [0, oo) of reflecting Brownian 
motion. Because 0 is a regular boundary point of Y(t) =|W(p)|, it follows 
(why?) that 0 is a regular boundary of the F(s) process. We now characterize 
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its behavior at the origin by evaluating the occupation time process for T (s) at 
{0}. (Some of the manipulations will be done without full justifications.) 
Consider 


Lot; T) = | To\(T(s)) ds = | Lo (Y(U (S) ds (8.17) 
0 0 


Executing the change of variable (for each sample path) s = U(r) in (8.17) 
yields 


U-3@) 
Lot) = | ToL Y()] dU(c) 


u-i) U- ia 
= f lol Y) dt + «|, To( Y(t) de(t). (8.18) 


The first integral on the right, corresponding to the occupation time of the set 
{0} for the Brownian motion process evaluated at time U~'(t), is identically 
zero. In essence, the actual occupation time as measured by the first integral 
(caution, this is not the local time functional) of a specific state point for a 
Brownian path over a given time epoch is zero. 

Next we examine the second integral. By its very definition, g(t) con- 
centrates all its increase only at the time points where the Y(t) process contacts 
the origin. In other words dg(t) > 0 only where I,,(¥(t)) = 1. Accordingly the 
final integral in (8.18) reduces to 


U- l(t) U-(t) 
f Lol Y(t) dolt) = f dolt) = p(U ~O) (8.19) 


It was pointed out following Eq. (8.6) that for a Brownian path emanating 
from the origin, the local time process g(t, 0) is strictly positive for all t > 0. 
Therefore, the quantity in (8.19) is positive, and since x > 0 by stipulation, we 
find that the occupation time functional of (8.17) is positive for all t > 0. Thus, 
the barrier {0} in the T(s) process fulfills the properties of the definition of a 
sticky boundary. 


D. JUMP DISCONTINUITY AT A BOUNDARY 


Consider a regular diffusion process on (l,r) and suppose l is a regular 
boundary point. As has often been stated, the infinitesimal rates governing the 
sample path realizations apply only at the interior points of (l, r). The behavior 
at a boundary point l requires a separate specification. A feasible pattern is the 
following. 

When ! is reached, the process waits there for a random duration (following 
an exponential distribution) and leaves / by a jump into (l, r) according to a 
given distribution. The probabilistic transitions for the movement inside (I, r) 
are thereafter guided by the infinitesimal parameters until the next attainment 
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of the boundary. The waiting and jump effects at successive visits to l are 
independent. To sum up, the process behavior consists of two types of epochs: 
In the first, continuous movement occurs until the boundary is attained, 
and in the second, at each boundary visit, an exponentially distributed sojourn 
transpires, culminating with a jump return into (l, r). F 

It is of interest to contrast the nature of the sojourn and jump phenomena 
possible from a boundary point and to ask whether such behavior can be 
manifested at interior points of the state space. Certainly, a diffusion process, 
which by definition necessarily has continuous sample paths cannot exhibit 
jump discontinuities. The question remains whether there can exist a state c in 
(l, r) such that, at each attainment of c, a random independently distributed 
sojourn time is registered at this point before moving on in a continuous manner. 
Of course, in order to preserve the Markov character of the process, the sojourn 
time at any given state is necessarily exponentially distributed (cf. Chapter 14, 
Section 3). 

Consider the process beginning at c and suppose it waits a random period 
at c and moves off continuously. We shall establish presently that the strong 
Markov property coupled with the fact that X(t) is a diffusion precludes the 
existence of a state c at which such a random waiting period can appear. Let n, 
denote the first departure time from the position c, where X(0) = c. We claim 
that n, is a Markov time. In order to validate this assertion it is necessary to 
show that {w, nœ) < t}e F, for each t; in other words we must show that to 
decide whether the event described by the inequality n. < t holds or not it is 
enough to know the process values X(s) for all s <t. But, in view of the 
continuous paths, n. < t means there exists some rational s < t where X(s) # c. 
Obviously the event X(s) # c is certainly measurable with respect to F; < F,. 
It follows that 7, is a stopping time as stated. 

Now let A be a Borel subset of (l, r). We shall now show that 


Pr{X(n, + £) € A|X(n,), X(0) = c} # Pr{X() € AIX), (8.20) 


violating the strong Markov property. In fact, let p(t, a, x) denote the transition 
density function of the X(t) process. That is, 


p(t, c, x) dx = Pr{x < X(t) < x + dx| X(0) = c}. 


Since the path is continuous, we infer that X(n.) = c. By the law of total 
probabilities and the strong Markov property, the left-hand side of (8.20) 


becomes 
t 
| i) p(t — 1, ¢, y) iybie™ dt. (8.21) 
o (Ja 


On the other hand, the right-hand side of (8.20) equals 


| p(t, c, y)dy (8.22) 
A . 
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and the equality of (8.21) and (8.22) is only possible if A = œ, meaning that 
movement through c is instantaneous. The contradiction in (8.20) can be 
avoided only if E[7.] = 0 for all c in (l, r), or alternatively the sample paths are 
allowed to be discontinuous. In particular, for a diffusion process nonzero 
sojourn epochs cannot transpire at interior points of the state space. 


E. INSTANTANEOUS RETURN PROCESS 


Consider a standard Brownian motion {B(‘),t > 0} on the real line, with 
B(O) = 0. Fix the interval (—1,1). Construct the return process Z(t) agreeing 
with X(t) until the point 1 or —1 is reached. At that time, the particle is 
instantaneously returned to the origin, starting the Brownian motion afresh. 
This procedure is repeated at each attainment of 1 or —1, so that Z(t) is a 
stochastic process on the state space J = (—1, 1). Intuitively, it is expected that 
a limiting and stationary distribution of the return process exists. In this 
example, we find the form of this limiting distribution. 

More generally, consider a regular diffusion {X(t), t > 0} on the interval 
I = (lr), and let l < a < b < r. We construct the return process Z(t) relative to 
[a,b] as follows. Starting at a point x, in (a,b), the particle is returned 
instantaneously to xọ whenever a or b is reached. After such a return, the 
subsequent motion of the process behaves just like X(t); this process is 
repeated at each attainment of level a or b. The resulting process Z(t) consists 
of recurrent cycles of random time duration C4, C2, C3, ... where the C; are 
independently and identically distributed, with the same distribution as 
T,» = min{T,, T}, the first exit time from the interval (a, b), starting from xo. It 


a 


follows that 
b 
E[C;|X(0) = xo] = | G(Xo, €) dé, (8.23) 


where G(X, č) = Gya,5(Xo, é) is the Green function of the process X(t) relative 
to the interval (a, b); this is displayed explicitly in (3.15). 
Let q(t, y) be the density function of Z(t). That is, 


q(t, y) dy = Pr{y < Z(t) < y + dy|Z() = xo}. 
The objective is to evaluate «(y) = lim,_,, q(t, y), the limiting density of Z(t). 


To do this, we fix an interval [y,, y2] with a < yı < y, < b and define the 
process {I(t), t > 0} by 


I(t) = è if yı < Z(t) < yz» 


24 
0 otherwise. eed) 


In the language of renewal theory, /(t) is on if Yı < Z(t) < ya, and otherwise /(t) 
is off. We can now split a typical cycle (of length C,, say) into two parts: in one 
part, /(t) is on, in the other /(t) is off (see Fig. 5). We have shown that 
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Pr{J(t) = 1} = EL] = J}? q(t, y)dy, and we can now appeal to the renewal 
theorem as used in Section 7.C of Chapter 5 to deduce that 


lim Pri@ = ee E[time spent in (y,, y2) in a cycle|Z(0) = x,)] 


8.25 
aR i E[time duration of a cycle|Z(0) = xo] G2) 


But we showed in Section 3 that the numerator is [2 G(xo, €) dé, while, 
from (8.23), the denominator is just f? G(xo, ¢) dé. Hence 


o h > Glo, č) dë 
1 dy ="? 
i y Ge NAY = E Gio Bak 


Since this holds for every y,, y2 in (a, b), we may deduce that 
b 


a(y) = lim q(t, y) = G(x, y) | | G(Xo, ¢) dé. (8.26) 
t> œ a 
Example. Take {X(t)} to be a standard Brownian motion with X(0) = 0, 
a = —1, b = +1. Then the limiting density is a(y) = 1 — |y, —1 < y <1. 


9: Conditioned Diffusion Processes 


Consider a regular diffusion process {X(t), t > 0} with state space [0,1] and 
infinitesimal mean and variance p(x) and o?°(x), respectively. Assume that 0 and 
1 are exit boundaries, and let p(t, x, y) be the transition density of X(t). It is of 
interest, especially in certain biological contexts, to concentrate only on the 
realizations of the process that lead to absorption at 1. Accordingly, 
{X*(t), 4 =O} is prescribed to be the process confined to the sample paths 
involving ultimate absorption at 1. That X*(t) exhibits only continuous paths 
is clear, since X*(t) is merely a restriction of X(t) to part of its original sample 
space. Moreover, X*(t) is a Markov process since X(t) is, and past history 
(beyond the current state) cannot affect where absorption occurs (the reader 
should apply a formal argument). [It follows that X"(1) is a diffusion process. 
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For the X*(t) process it is intuitive that the boundary point 1 remains an exit 
boundary, while the boundary point 0 is converted into an entrance boundary. 
We will justify these conclusions later. 

We suppose hereafter that the diffusion X(t) satisfies the moment con- 
ditions of (1.4), and we shall refer to X *(t) as an associated conditioned diffusion 
process. The one constructed above is a special case of a large class of such 
processes. 

Our immediate task will be to determine the infinitesimal parameters of the 
process {X*(t),t > 0}, and subsequently we shall highlight a number of 
examples and applications. 

To proceed more formally, we assume that the boundaries are attainable in 
finite expected time, i.e, E [Tọ A T,] < œ (As usual, T, denotes the first 
hitting time of the point z). 

We define 


$9 x 
(2) = ep- f a in| and —-S(x) = | s(2) dé. 


Because both boundaries are exit by assumption, S(x) and even S(1) are finite 
and Pr{T, < To|X(0) = x} = S(x)/S(1), for 0 <x <1. 
To simplify the development we assume that 


S"(x) = s(x) = —[2u(x)/o?(x)]s(x) is bounded for 0 < x < 1. 


Let p*(t, x,y) be the transition density function of the conditioned process 
X*(t), or equivalently, the density of the X(t) process, conditioned on the event 
T, < Tọ. Since the conditioning event has positive probability, we obtain from 
Bayes’ rule 


p*(t, x, y)dy = Pr{y < X(t) < y + dy|X(0) = x, T, < To} 


_ Pr{y < X(t) < y + dy|X(0) = x} Pr{T, < Ty|X(0) = x, X) = y} 
E Pr{T, < T| X(0) = x} 


By the Markov property 
Pr{T, < To|X(0) = x, X(t) = y} = Pr{T, < TIXO) = y} = SQ)/S(). 
Combining, we have 


p(t, x, y)S(y) 


S(x) (9.1) 


p*(t, x, y) = 


Consider now 


*(h) — X* *(()) = 1 

u*(&) = lim EHZ (h) -X Ox (0) = ¢] = im =| Ch E, yy — E) dy 
h}0 h\o 

ate ph, č, YSO) — &) 92 

m al se OC ue! 
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as 


A Taylor expansion of S(y) about y = € (S is differentiable at least twice by 
definition) leads to 


(y — é? 
2 


S(y) = SÈ) + (y — €)s(E) + s‘(2), (9.3) 


where z is in the interval (é, y) and depends, of course, on ¢ and y. 
Substituting from (9.3) into (9.2) yields 


pgs malje = Bnet yay a fo — 2 ph, €, y) dy 
+ x BO ag |0- €)° ph, č, y)s'(z) iy}. (9.4) 


Using the assumptions that lim, ..9(1/A)E[|X(h) — X(0)|3|X(0) = €] =0 and 
that S”(z) = s‘(z) is bounded, (9.4) reduces to 


*(¢ SÈ) 2 i 
u*(C) = u(¢) + 5° (¢). (9.5a) 


Consider next 


ote = rns fo- é)*p*(h, & y) dy 
1 s(z) 
2 D BAA 5b 
we ffo é) plh, é, y| +y- é) A ay} (9.5b) 


=o07(€) since s is bounded. 


We are now in a position to be able to classify the boundary behavior of the 
conditioned process X*(t). It follows from the definitions ((3.5)-(3.7)) that the 
scale functions S*(x) and s*(x), and the speed density m*(x) of the conditioned 
process are given by 


s*(x) = [S] s Sž) = —1/S(X), (9.6) 
m*(x) = [S(x)]?m(x). (9.7) 


We shall now show that 0 is an entrance boundary for X IN; To do this, we 
need to show that 


S*(0, n) = [se dč = œ and N*(0)= I, iF dS*(é)dM*(n) < œ. 
0 O dn 


But, because S(O) = 0, it follows that 


n 1 
ds* = a= 
i (= (s se ndë = a Sin) 
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Next, 


b fb 
N*0) = | | ds*(2) dM*(n) 
On 


= WES 2 30 m(n) d 
Bsa) SOL AA 
b b Pn 
< f S(n)m(n) dn = l l dS(E)m(n) dn = E(0) < œ% 


since 0 is an exit boundary for X(t). We have now established that 0 is an 
entrance boundary for X*(t). The proof that state 1 remains an exit boundary 
is similar and is left as an exercise. 


7 


GREEN FUNCTIONS FOR THE CONDITIONED PROCESS 


We can define the Green function on the interval (0,1] for the conditioned 
process using the results of (9.6) and (9.7). Direct substitution into (3.15) using 
the fact that S*(x) = —[S(x)]~! leads to 


ESES -— SA SE 


for 0<sx<čë<1 


A S) PHE) 
= 2S0 SOSE SE 
S FIOS for O<€<x<l. 


(9.8) 


Here, we have used the subscript « to indicate conditional Green functions, 
as opposed to the functions G*(x, č) of (3.30). Then G,,(x, č) dé is interpreted as 
the mean time the conditioned process spends in the interval [¢, € + dé) prior 
to the absorption time T,. A formal derivation proceeds by first considering 
the Green function on a subinterval [a, b] of (0, 1) and letting a | 0 and bf 1. 
Because 0 is an entrance boundary for X*, we have 

lim T* a Tř = TF. 
a\0,bf1 
(Compare to Example C of Section 4. See also Section 6.) 

We can use the above result to compute, for example, the mean of the 

absorption time T* when the exit boundary is reached. We proceed as follows. 


E.LTY] = | G(x, č) dč 


z 2[S(1) — SQX)] | S?(é) dex 2 S(QES(1) — S(E)] dé 
S(1)S(x) o 07(E)s(€) x a(S) 
(9.9) 
using (9.8). This should be compared to (3.12). 
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Example 1. Wright—Fisher Diffusion with No Mutation or Selection. Recall 
that X(t) is a diffusion on [0,1] with infinitesimal parameters p(x) = 0, 
a(x) = x(1 — x). We verified in Section 4, Example C, that S(x) = x, so that 
for the X*(t) process we have 


w(x) =1— x, o*?(x) = x(1 — x). (9.10) 


The expected time E [Tř] to reach 1 for the conditioned process starting at x 
is given, using (9.10), by 


x 1 
EITI- 2f dy + al mes LEUE DaD 
x ol-y A x 
Interpreting the units of time in the standard way such that one unit of the 
diffusion reflects about N generations of the discrete approximating model, we 
infer that the expected time until fixation under the initial condition x = 1/N 
(i.e., a single mutant type) is approximately 


‘i 201 — (1/N)] 


N log[1 — wn} ~ 2N. (9.12) 


Among population genetics investigators, (9.12) is commonly replaced by 4N, 
the extra factor of 2 is attributable to the fact that population size consists of 
2N genes instead of N individuals. 


Example 2. An Infinite Allele Neutral Mutation Genetic Model. We assume 
that in each generation an existing type may mutate to a new type with 
probability u. We focus attention on a specific allelic type A currently 
represented in the population and consider a population of N individuals 
composed of i A-types, and N — inon-—A-types. Postulating that only mutation 
and random sampling effects operate, the mean proportion of A-types after 
mutation is 7; = (i/N)(1 — u). 

Let X,, be the number of A individuals in generation n. In line with the 
Wright—-Fisher formulation of Section 2, Example F, the next generation is 
determined by N binomial trials, such that 


Pr{X,41 =/|X, = = (ra E me, O<ijsN. 


Clearly, state 0 is absorbing, whereas state N is not. This follows because 
mutation will ultimately cause the specific A-type to disappear, so that a 
population configuration consisting exclusively of A-types can last only 
temporarily. We say that a quasi-fixation occurs at some time n if X, = N, 
signifying that the population comprises only A-types. It is of some interest in 
biological applications to evaluate (at least to a good approximation for large 
N) the probability of quasi-fixation, and the expected time to quasi-fixation, 
conditional on this event occurring. We can do this by calculating the 
corresponding quantities for appropriate diffusion approximations, 
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In the spirit of the derivations of the diffusion approximations of Section 2, 
we obtain a limiting diffusion process {X(t),t > 0} on the interval I = [0,1], 
with infinitesimal mean and variance 


Wx) =—yx,  @°(x) = x(1 — x), (9.13) 
respectively. (We have assumed that y = limy- Nu.) Using (9.13), it is easy to 
check that the scale density s(x) and scale measure S(x) of X(t) are given by 


It follows from the results of Section 6, Table 6.2, that if y >4 then the 
boundary point 1 is nonattracting and hence unattainable, and thus x(x), the 
probability of a quasi-fixation starting from X(0) = x, is identically zero. When 
0 < y <4, then x(0) = S(x)/S(1) is given explicitly by 


mx) =1—-(1—x)'~?, O<y<#. (9.15) 


2y #1. (9.14) 


We now compute the expected time to quasi-fixation, conditional on this event 
happening. From the above discussion, we require 0 < y <4, and then we 
consider the diffusion process {X*(t), t > 0} restricted only to those re- 
alizations leading to quasi-fixation. It is clear that the conditioning results 
developed in this section apply. In particular, the mean time E [TY] to reach 
state 1 starting from X*(0) = x is given by the formula (9.9), with S(x) as in 
(9.14). Of particular interest is the case where x = 1/N, corresponding to the 
introduction of a single copy of the mutant A-type. Explicitly, we have 


1-2y 1/N t= Zya 
Panli ilie =a Aes I Tee ne á 
! 1-0-07» 
i af (a “ 
= I,(N) F LAN), 
say. But 
He ( 2 \ (= 1/N)t= E -= 1N} -27 y 
1— 2y/\1-(1— 1/N)! 2y (1/N)(1 — 1/N)! 27 N 


1-2y 
se Meaty ge. 
1 — 2y N N 
so that NI,(N) > 2 as N > œ. Also, NI,(N) ~ Nc, where c is a finite constant, 


oe OS aN 
c= eS / k ) k 


It follows that the conditional mean quasi-fixation time starting from x = 1/N 
is, in time units of the original model, N/J,(N) + NIHAN) & N(2 + ¢) as N => œ. 
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BROWNIAN MOTION CONDITIONED ON ITS STATE AT TIME 1 


We construct a number of processes extracted from the Brownian motion 
process {B(t), t = 0} by imposing appropriate restrictions on the sample 
space. 

Let « and B be fixed real numbers. Let W*(t) for 0 < t <1 denote the 
constrained Brownian motion conditioned that 


a < B(1) < $. (9.16) 


Intuitively, W*(t) is a diffusion process with time parameter confined to the 
interval 0 < t < 1. We shall now determine the infinitesimal parameters of the 
process, which can be time dependent because of the requirement of (9.16). We 
follow the recipe of the calculations underlying (9.5a) and (9.5b) with one 
essential modification. Let n(x, t) be the probability that from the state value x 
at time t the sample path of B(t) satisfies (9.16) at time 1. 

If p(t, x, y) denotes the transition density of the unconditioned process at 
time t then the transition density of W* is given by 


p*(t, x; s, y)dy = Pr{y < W*(s) < y + dy|W*(t) = x} 


= p(s — t,x, yny, s) dy 
n(x, t) 


y O<t<s<l. (9.17) 


The justification of (9.17) follows that of (9.1) mutatis mutandis. We are now 
prepared to compute 


B(x, t) = lim + fo — x)p*(t, x; t + h, y) dy. (9.18) 
hlo 


All our calculations are done in a formal setting in the spirit of the arguments 
of Section 3. A rigorous validation is substantially more technical. We 
postulate sufficient regularity for (x,t) to permit the use of the Taylor 
expansion, 


my, t + h) = n(x, t) + (y -92 t) +h 6 t) + o(y — x) + olh). 


This is a mild assumption readily verified [see (9.22)] in the case at hand. 
Substituting from (9.17) into (9.18) gives 


1 
u*(x, t) = lim i fo — x)p(h, x, y) dy 
hilo 


ee im [o — x)?plh, x, y) dy, (9.19) 
n(x, t) h0 h 


again stipulating a strong form of the infinitesimal relations for the B(t) 
process. In the case of the Brownian motion process no obstacles are 
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encountered in these presumptions. By virtue of (9.19) we obtain 


On(x, t/Ox , 
> = — m i 2 
L(x, t) = w(x) + io (x) (9.20) 
By similar means we secure 
o*(x, t) = o(x). (9.21) 


For the specific case under consideration we have p(x) = 0, o7(x) =1, and 
therefore 
On(x, t)/Ox 


* = 
n(x, t) > o (x, t) 1, 


u*(x, t) = 


e7? dé 


e, i , 1 [6-9 
n(x, t) = | —= e70» 120-9 gy 2 
5 ( 


/2n(1 = t) /2n 


It is sometimes convenient to use the notation IT, g(x, t) for expression (9.22). 

Of special interest is the specification « =—e€, B = +e, and then the 
ascertainment of the limiting conditioned diffusion is determined after sending 
e to zero. The process obtained is called the Brownian bridge, for reasons 
indicated later. A simple direct computation shows that for IT _, (x, t) = I(x, t) 
we get 


a= x) VI =t 
(9.22) 


3 1 ôF(x, t) x 
1 £ =— : 9.23 
coly Ox [4 pe) 


Thus Brownian motion conditioned on B(1) = 0, i.e., the Brownian bridge 
process, corresponds to a diffusion process {W(t), 0 < t < 1} with infinitesimal 
coefficients 


Aix, t) = a 6(x, 1) = 1. (9.24) 


Sharply contrasting with the conditioned diffusions associated with (9.16), 
the diffusion of (9.24) is constructed by conditioning on the realization from a 
collection of sample paths having probability 0. Despite the complication of 
dealing with constraints inducing events of probability 0, the diffusion process 
determined by the parameters of (9.24) is well defined. Later in this section we 
shall achieve a representation of this process from which a description of its 
properties is readily forthcoming. 

More generally, Brownian motion conditioned on B(1) = a gives rise to 
the diffusion process {W,(t), 0 < t < 1} with infinitesimal parameters 


(a — x) 
1-1 


Hx, t) = : a(x, 0) =l. (9.25) 
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We shall provide another approach and representation for the Brownian 
bridge: Consider Brownian motion subject to a deterministic change of time 
variable and a scaling of the state variable of the explicit form 


road a(S). Gee (9.26) 


The process I(t) is obviously a diffusion, and it remains to ascertain its 
infinitesimal mean and variance. This will be done by direct evaluation of 


1 
lim 7 EL + h) — x|T(t) = x] = u(x, t), (9.27) 


tim aitti- aboae 
hyo h 


by reducing the calculation to a familiar one involving Brownian motion. 
Note that I(t) = x corresponds to B(t/(1 — t)) = x/(1 — t), and because 
Brownian motion has zero drift, that 


t+h t 
B = = ; 
e) (5) J y for t>0, h>0 


Therefore 
E[r + h) — x|T() = x] = ela oe ns =) 


l-t 
t x 
— x|/B/(—_]) = 
ss (4) eal 
oO ae ae eerie 
-= -1) 
1-t 
xh 
Si (9.28) 


It follows when dividing by h and sending h to 0 that (9.28) converges to 
—x/(1 — t). A similar calculation verifies that o?(x, t) = 1. 

The above analysis establishes that the Brownian bridge identified as 
Brownian motion conditioned such that B(1) = 0 can be realized by the 
transformed process 


r() = (1 — a(r) 0<t<1, (9.29) 


where B(s) is standard Brownian motion. Every sample path of B(s) induces a 
sample path of the Brownian bridge process through the representation (9.28). 
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Notice that (1) vanishes since B(u)/u > 0 as u > co with probability 1 (this 
fact is established in Chapter 7). Observe the further property that F(t) is a 
Gaussian process [since B(s) is Gaussian] and we now compute its covariance 
function. Clearly E[T(t)] = (1 — jE[B(t/(1 — t))] = 0. Consider now 


EIGD] = (1 = s1) — selef; T )a( 2 )] 


1-s, 


= (1—5,)(1— s3) min( 2 z5) 


1—s,’1-s, 


s,(1 — s2), Si < S2 
= 9.30 
a =, sı), Sy > S2. ( ) 
A third version of the Brownian bridge process is 
ÊO) = B(t) — tB), O<t<1. (9.31) 


To validate this statement we need merely verify three properties: 


(i) B(t) is Markov. (Why?) 
(ii) B(t) is a Gaussian process. (This is obvious since B(t) is Gaussian and 
B(t) is constructed as a linear operation on a Gaussian process.) 
Gii) E[B(t)] = 0 and 


s,(1 — s2) for s; < Sp, 
sa(1 — s,) for s< s4. 


E[B(s,)B(s2)] = i 


Indeed, 


E[B(s,)B(s2)] = E[{B(s,) — s,B(1)} {B(s.) — s2 BU)}] 
= E[B(s,)B(s2)] — s,E[BU)B(s2)] — s2 E[B(s1)B(1)] 
+ sıs2 ELBO]? 


= min(s,, S2) — S152, 


and this formula is identical to (9.30). 

Two Gaussian processes with the identical covariance function share the 
same finite-dimensional distribution functions. Thus B(t) and T(t) identify the 
same diffusion, characterized by the infinitesimal parameters (9.24). 

A final direct approach to the acquisition of the Brownian bridge is to exploit 
the interpolation formulas of Theorem 2.1 of Chapter 7. The joint distribution 
of 


B(ty), «+65 BUY), O0O<t,<--t <4, 
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conditioned on B(0) = 0 and B(1) = 0 is multivariate normal having a co- 
variance matrix of elements 


oj, = E[BU)Bt)] = —t), i<j. 


This is a direct extension of Theorem 2.1 of Chapter 7. 

The name Brownian bridge derives from the constraints B(0) = B(1) = 0 
and the feature that if a path {B(t), 0 < t < 1} is also nonnegative its picture 
looks like a bridge (see Fig. 6). A synonym is “tied down Brownian motion.” 


FIG. 6 


The Brownian bridge process is intimately associated with certain random 
functionals of empirical distribution functions based on independent obser- 
vations (cf. Chapter 13). 


AN EXAMPLE OF BROWNIAN MOTION CONDITIONED ON ITS GROWTH BEHAVIOR 


Finally, we shall construct a conditioned diffusion process obtained by impos- 
ing a growth bound on the sample trajectory. The potential usefulness of such 
constructions is clearly shown by the following concrete case. Let {B(t), t > 0} 
represent standard Brownian motion and define the process 


Z*(t) = Bit) constrained such that B(t)<ot+ 8 forallt > 0. 
(9.32) 


The Z*(t) process is a diffusion (why?), and to ascertain the associated in- 
{initesimal parameters it is essential to evaluate 


n(x) = Pr{B(t) < at + $ for all t| B(0) = x}. 


Consulting Elementary Problem 7, Chapter 7, we find that 


(fer eB, 
a H ; x> p. 
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We can now compute the infinitesimal parameters of the Z*(t) process following 
the recipe of (9.5a,b). In this vein, we obtain 


o*?(x) = o7°(x) = 1 


(x) e~’ Relg 2a 
u(x) F n(x) = 0 + 1— e7 2B- x) = e2%(B ~ x) =] 


0 for x> fp. 


u*(x) = for x< fp 


Note that u*(x) is discontinuous in this example. 
10: Some Natural Diffusion Models with Killing 


A. A DIFFUSION APPROXIMATION TO A’PROBLEM IN THE GENETICS OF RECOMBINATION 


Consider a finite diploid population in which two alleles (types) are segregating 
at each of two linked loci. The problem is to determine the probability that a 
recombination event will occur before the population fixes on one chromo- 
some type. 

Specifically, suppose there is a population of N individuals. Each in- 
dividual has a number of chromosomes which occur in homologous pairs. We 
are concerned with the genetic constitution at just two loci, which are 
“linked” so that the genes occurring at the two loci are not independent. 
Suppose that at locus (position) 1 there are two possible alleles, A and a, and 
at locus 2 there are possible alleles, B and b. An individual can then express 
any of ten possible genotypes 


A B e chromosome 
a B e the homologous chromosome 


Tof 


locus 1 locus 2 


The alternative chromosome pairs are 


AB AB Ab AB AB Ab Ab aB aB ab 
AB’ Ab’ Ab’ aB’ ab’ aB’ ab’ aB’ ab’ ab 


AB B 
[Note that the chromosome ordering is irrelevant: e.g., ae and = are 
a 
ean a, ; Ab AB 
indistinguishable. However the phase is important, i.e., AN and ok are 
a a 


different genotypes. ] 

In the formation of the next generation, individuals produce (segregate) 
gametes, the equivalent of one chromosome from each homologous pair. Two 
gametes in the population will then join to form an individual of the next genera- 


10. SOME NATURAL DIFFUSION MODELS WITH KILLING 273 


tion. It is in the segregation of the gametes that recombination may occur. Thus 
an Ab individual may produce an Ab-gamete, or an aB-gamete: but also, by 
aB 
recombination, it may produce AB- and ab-gametes. When two loci are linked 
these recombinant gametes are less likely to be produced than the other 
(parental) gametes. The recombination parameter r quantifies the linkage thus 
by: 
r = probability of a recombination. 


F Ab... . 
For example, the gametic output of an T individual contains the four 
a 


chromosomal types Ab, aB, AB, ab in the proportions 41 — r), 4(1 — r), $r, 47, 
respectively. (Note that for many genotypes recombination is irrelevant to the 


AB i 
production of gametes: e.g., a segregates 4AB and 4Ab irrespective of 


recombination.) 
Suppose that the population at hand initially contains only the gamete 


ane b B 
types Ab and aB, where all individuals are of genotypes 2, AP or a 


b'aB aB 
If r is small, then it is possible that many generations will pass by without 
the appearance of an individual with a recombinant chromosome (AB or ab) 


Ab i ; ‘ 
produced from the genotype D During that time, it could happen that the 
a 


: ; ; : Ab . 
population reaches a state in which everyone is of the type rie the population 


can never leave that state—for only Ab-gametes can possibly be produced— 
the population is “fixed” on Ab. Similarly the population could fix on aB. 

The problem is then to determine the probability, given the initial makeup, 
that a recombinant gamete appears in the population before fixation occurs 
and prevents that possibility. 

We now set up a model for this problem, in the course of which the 
mechanics of the formation of one generation from the preceding one will be 
clarified. Suppose that no recombination has yet occurred. There are 2N 
gametes of N individuals in the population: suppose there are i of type Ab, and 
hence 2N — i of type aB. Of course, to fully describe the population we need 
not only the gamete frequencies but also the genotype frequencies—that is, 


Sag ete cag tke B . 
how the gametes are paired in individuals (are they mostly = and = with 
a 


; Ab > : f r 
just a few an or vice versa?) However, if N is not too small, the following 
a 


approximation gives sufficient accuracy, and greatly simplifies the analysis: 
namely, that the (assumed) random mating of individuals in the previous 
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generation, which is equivalent to the random union of their gametes, entails 


the genotype frequencies 
Ab (i) 
Ab \2N 


Ab i i 
a Aata) 
aB i iY 
aB 2N) ` 
From this population, an Ab-gamete could be produced either 
Ab 
by an — with probability 1 
y AG P y 


Ab 
or by an K with probability 41 — r). 
a 


So, when the number of Ab-types is i, then 
pi = Pr{a gamete produced is Ab} 


iY i i 
= | 2( — aie i= 
(5) i Nl A 2 
så Nec. 
~ 2N 2N 2N]° 
qi = Pr{a gamete produced is aB} 
iY i i \; 
(: 7 in) 5 (ax )( - iw) =) 
i i i 
oN eo E am) 
Finally, the probability that a gamete produced is of a recombinant type is 


i i i i 
1 — (p: + qi) = (50 — ax = h = an} 


The 2N gametes which will make up the individuals of the next generation are 
chosen by random binomial sampling from this pool of gametes. 

Thus the transition probability, P;, that there are j gametes of type Ab, and 
2N — j of type aB is 


Similarly, 
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The probability that no recombinant gamete makes it into the next 
generation is (p; + q;)?™, so that the probability that one or more recombinants 
do appear is 


1 — (Pi + q)?%. 


Notice that it is assumed that the population size N is the same for each 
generation, and that there are no selection pressures—every individual sur- 
vives to breeding age and has an equal chance to contribute to the next 
generation. 

The situation is now that of a Markov chain with 2N + 2 states, 2N + 1 of 
which represent the state in which no recombinant gamete has yet appeared, 
and the population now contains j Ab- and 2N —j aB-gametes 
(j = 0, 1,..., 2N). The other state is that a recombinant gamete has appeared in 
an individual. For our purposes this is a killing state. Admittedly an AB- or 
ab-gamete could appear and then not make it into the next generation, and 
never appear again, but what matters is that a recombination has occurred. 
Strictly, this state is that AB or ab has been present in some individual, past or 
present. 

As noted earlier, the states j = 0 and 2N are absorbing. 


FIG. 7 


Starting from the initial state i, a population will almost surely reach one of 
the absorbing states or be killed. What is the probability R; that it eventually 
reaches the recombinant state R? 

From the theory of Markov chains this can be solved in the following way: 
if u= 1—R, = probability that the population eventually fixes, then 
(Uo, Wy, ..-, Uy) satisfy the equations 

2N 
u= } Pju, i= 0,1,....2N 


J=0 
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with the boundary conditions 
Ug = uzy = 1. 


For large N the solution of these equations is, in practice, impossible, 
analytically or numerically. We proceed to approximate the Markov chain by 
a diffusion process. Numerical computation indicates that for N > 10, this 
approximation gives satisfactory accuracy. Strictly speaking, the relevant 
approximating process is a “diffusion process with killing”: we follow the 
progress of the frequency of the Ab gamete, until the process is “killed” by the 
appearance of a recombinant gamete or fixation occurs. 

At the nth generation, the state variable X(n) can take the values 
X(n) = j/2N,j = 0,1, ...,.2N. Given that no recombinant then appears, X(n + 1) 
is then distributed as 1/2N of a binomial variable with parameters 
{2N, p;/(p; + 4;)}. The probability that no recombinant does appear is, as we 
have noted before, (p; + q)?™. Then 


i| xin +) — xix =F =x] = (2 — x), + a 


= [p; — x(p; + 4) (P; + 9; 


j 1 ; l 
aroga 


Pj F 
+(—2_ - x) fo Baye 
P eee 


Pid; ~ 
= E + (p; — x(p; + a |e, + qn? 


and Pr{killed| X(n) = j/2N = x} = 1 — (p; + q,)°. 
Introducing a new recombination parameter p defined by 


r = p/N?, 
and regarding p as a constant, we see that 
E[AX | X(n) = x] = O(1/N?) 
E[(AX)?| X(n) = x] = x(1 — x)/2N + O(1/N?) 
Pr{killed| X(n) = x} = 4px(1 — x)/N + O(1/N?). 


Now we rescale time so that one unit of time corresponds to the passage of N 
generations, i.e., let 


pasa 


Yy(t) = X(LNt)). 


Finally, let N > oo. Then the processes Y(t) will tend in the limit to a diffusion 
process Y(t) with killing. The infinitesimal parameters of the Y-process are to 
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be found in the limit from the X parameters, remembering that AX is over a 
rescaled time period of 1/N units: thus, 


B(x) = im ag eon )=x]=0 


PO 7 x(1 — x) 
a°(x) lim oN) “aim TN) ELAY Xn) = x] = 5 — 
k(x) = Tni TN Pr {killed | X(n) = x} = 4px(1 — x). 


Y is then a diffusion process with killing on the interval [0,1]: the problem is 
to determine the probability that the process reaches the absorbing states 0 or 
1 (where the killing rate is zero) before it gets killed. 

There are two ways of approaching this question. We may treat this as a 
modified type of diffusion process, with three relevant parameters (u, c°, k) and 
obtain the solution essentially from first principles. Alternatively, we can 
approach it via the Kac functional (Section 5), treating it as an ordinary (u, c°) 
diffusion process, but modifying the expectations and probabilities by the 
killing factor exp[—Jo k(Y(t)) dt]. These two methods are, of course, equiva- 
lent, and they both lead to the following result: 

Let u(x, t) = Pr{starting at x, the Y process has not been killed by time t}. 
Then u satisfies the differential D [cf. (5.39)] 

ou 1 


ao = hoy ee aa po) — k(x)u 


together with 
u(0, t) = u(1, t) = 1, u(x, 0) = 1. 


Clearly, u(x,t) is decreasing in t and is bounded below by 0. So, as t > œ, 
u(x, t) tends to a limiting value u(x, oo) = u(x), and du/dt > 0. 
Therefore, u(x) satisfies 


1, æ d 
0= 50) 53 F x) = —k(xju, (0) = (1) = 1, 


and u(x) has the interpretation, 
u(x) = Pr{process is never killed | X(x) = x} 
= Pr{process reaches 0 or 1| X(n) = x}. 
The probability we require is Z(x) = 1 — u(x). This satisfies 
Jox) Z" + Wx)Z' — k(x)Z = —k(x), Z(0) = Z(1) = 0. 
Substituting for u, a°, k and canceling common factors, this simplifies to 


42" = 4pZ = —4p, - Z0) = Zl) = 0 
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Noting the trivial particular solution, Z(x) = 1 to this inhomogeneous differen- 
tial equation, we try to fit the general solution 


Z(x) = 1 + Asinh(4,/px) + Bcosh(4,/px) 


to the boundary conditions. This is straightforward and yields 


__ sinh(4,/px) + sinh(4,/p(1 — x) 
sinh(4,/p) ` 


Of greater interest is the following modification: suppose that only the 
appearance of an AB-recombinant gamete is of concern and kills the process, 
and that if an ab-gamete appears in an individual it is recognized, discarded, 
and replaced by a new, randomly chosen gamete. Obviously, this simply has 
the effect of halving the rate of killing, so that R(x), the probability of 
eventually obtaining an AB-gamete, is 


_ sinh(\/8px) + sinhL/8p(1 — x)] 
sinh BP l 


As would be expected, we can see immediately that if p is large (so that 
r > 1/N?°) then R ~ 1, and recombination before fixation is virtually certain. 
On the other hand, if p is small, then R ~ 0, and fixation almost certainly 
occurs first. 

The maximum probability for the appearance of an AB-recombinant 
occurs when the initial state is x = 4: 


RG) = 1 — (2 sinh ./2p/sinh 2,/2p) = 1 — sech,/2p 


= 1—sechN./2r x 1 — 2e7 Nv?" 


Z(x) = 1 


R(x) =1 


the last approximation applying if N Jr is large. 

The formula N = (1/./2r) sech“ ‘(1 — R(4)) enables one to calculate the 
population size to obtain an AB-recombinant with a specified confidence RG), 
starting from the initial state x = 4. Thus, to achieve R(4) = 0.99, we need 
N ~ (1/2r)log 200. 


B. THE DETECTION OF A RECESSIVE VISIBLE GENE IN A FINITE POPULATION 


Another interesting application of a diffusion with killing is provided by the 
following problem from population genetics. Consider a finite population in 
which two alleles, denoted by A and a, are segregating at a particular locus. 
What is the probability that an aa-genotype will be formed before the 
population fixes on the AA-genotype? 

We can formulate the problem in the following way. Suppose the popu- 
lation comprises N individuals, and that the aa-genotype is visible as soon as it 
is formed. This might correspond, for example, to the a-allele being lethal in 
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homozygous (aa) condition. Let X,, denote the number of heterozygotes (Aa) at 
time n. Assuming that we have not yet detected an aa-genotype, we can 
suppose that the population comprises i Aa-individuals and N — i AA- 
individuals. To find out the population composition at the next time point, we 
will assume there is random mating in the population, and so the possible 
mating types have the following relative frequencies: 


AA x AA Aa x Aa Aa x AA 
(1 — iN? (i/N)? 2(i/N)(1 — i/N) 


From matings of this type, an AA-individual can be produced with 
probability 1 from an AA x AA mating, with probability 4 from an Aa x Aa 
mating, and with probability 4 from the final mating type. Hence, 


pi = Pr{produce an AA-type} 


ieee ey a ee ey 
- N 4\ N 2 N N) 2N)` 


Similarly, 
qi = Pr{produce an Aa-type} 
-iý 1h D eee 
~ 2\N 2N N) N 2N)’ 
while 
i2 
r; = Pr{produce an aa-type} = INT 


To form the next generation of N individuals, we take a (multinomial) sample 
of size N according to the probabilities p;, q,,r;. Recall that if we sample any 
aa-individuals, then the process stops and so if X, =i, then the process 
continues to X,,,, = j with probability 


re N ET 
Pj= e Joa 0<i, j<N. 


This transition probability matrix is substochastic, that is, the rows no longer 
all sum to one. This corresponds to the fact that the chain can be killed by 
detection of an aa-individual. One way to rectify this defect is to add on an 
extra state H (for detected homozygote), and then 
N 
Pau =1- } Pj=1-(1- r), O<i<N 


j=0 
and 
Pay = 1, Py; = 9, O<i<N. 


Notice that in this model we have assumed that there are no mutation or 
selection pressures acting on the population, and that the population size is 
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fixed. State 0 is an absorbing state, corresponding to fixation of the A-allele, 
while state H is the killing state, and corresponds to formation of an aa- 
individual. 

The first problem we analyse involves finding the probability u; that 
fixation occurs before detection when the population starts from X, = i. As in 
the previous example in this section, this probability can in principle be found 
by solving the system of equations 


subject to up = 1. In practice, the system cannot be solved exactly, and even 
numerical computation is difficult for large values of N. In the spirit of the 
previous example, we will resort to diffusion approximation to determine the 
detection probability v; =1-— u;. Once again, the relevant approximating 
process is a diffusion with killing. 

Using the form of the transition matrix given above, we find that 


E(Xn41 — i|X, = i] = Na; — (a; + pòl- hes 
E[(Xn41 — |X, = i] = ENAN — Iq? — Qi-1)Na(1-—7r) + °- r] 
x [1 = ry’ ?, 
and 
Pr{X killed at time n + 1|X, =i} =1—(1—-7r,%. 


We define the process Xy(n) = X„/(2N)!®. (The factor 2 is a convenience that 
simplifies later formulas.) Then 


E[Xj(n + 1) — x| X(n) = x = i/(2N)'/7] = O(N 7/9), 


E[(X p(n + 1) — x}? |X y(n) = x] = me + (N~?) 
and 
2 
Pr{Xy(n + 1) killed| Xy(n) = x} = a + (N-23). 


We now rescale time so that one unit of time in the new process corresponds 
to the passage of (2N)'’> generations. That is, we set 


Y(t) = Xa(CNY T), t20. 


If we let N > œ, the processes Y)(t) will tend to a diffusion Y(t) with killing on 
the interval [0, œ), whose infinitesimal parameters are determined from those 
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of Xy. Recalling that time is now measured over rescaled time units of 
(2N)~ 1/3, we obtain 


u(x) = lim (2N)"3E[AXy|X p(n) = x] = 0 


N> œ 


o2(x) = lim (2N)3E[(AXy)?|Xy(n) = x] = x 


N> œ 
k(x) = lim (2N)!”®? Pr[killed| X y(n) = x] = x?/2. 


N> œ 


Since state {0} is an exit boundary for Y(t), we will determine the 
probability u(x) that fixation at 0 occurs before killing. As in the previous 
example, u(x) satisfies the equation 

2 
TO iH) + ueue) — kedul) = 0, 


with boundary conditions u(0) = 1 and u(co) = 0. Substituting for o°, u and k, 
and canceling common factors leads to 


u'(x) — xu(x) = 0, u(0) = 1, u(coo) = 0. (10.1) 


This differential equation is known as Airy’s equation, and arises in the study 
of radio waves, light spectra and the theory of differential equations. The two 
standard solutions are known as the Airy functions, A(x) and B(x), and are 


defined by 
x 1/2 2x3/2 2x3/2 
409 =F] S) bal 7) 


x 1/2 2x3/2 2x3/2 
w- [eS ) oC] 


where I,(x) is the modified Bessel function of order v. The function A(x) can be 
characterized (up to a multiplicative constant) as the unique strictly decreasing 
solution of (10.1) vanishing at œ. The solution we require is 


and 


and so the probability of detection (before fixation) is given by 

A(x) 

A(0)` 

This function can be readily evaluated using tabulated values of the Airy 
function, 


v(x) = 1 — 
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From a biological viewpoint, the most interesting part of the evolution of 
Y(t) involves those sample paths that result in eventual detection, as opposed 
to absorption at state 0. In what follows, we derive some properties of the 
process Y*(t) obtained by conditioning on eventual detection (killing). 

If p(t, x, y) is the transition density function of Y(t), then a simple condition- 
ing argument (cf. (9.1)) shows that the transition density of the conditioned 
process is given by 


© 


p*(t, X, y) = p(t, x, y) a E 


Consider now 


s(x) = lim EEO O = 2 
hlo h 


= * 
ais | (y= PMA YW) gy 
hlo h 


: l 
= im ra | (y — x)p(h, x, y)v(y) dy. 


h 


Expanding v(y) about x gives 


_ yy 
w0) =) + 0 — xe) +25" wre, 


> 


for some z between x and y. Then 


l ; 
u*(x) = a i | | (y — x)p(h, x, y)dy + a (y — x)’p(h, x, y) dy 


1 
+ w) fo — x)3v"(z)p(h, x, y) ay} : 


If we assume that v(x) is bounded, then we obtain 

v(x) 

v(x)’ 

an expression formally the same as the result in (9.5a). In our case, p(x) = 0, 


o*(x) = x and, since v(x) satisfies the equation v"(x) = — xu(x), v(x) is bounded, 
and so for our model 


H¥(x) = u(x) + 07(x) 


_ xv'(x) 
U(x) | 


In a similar way, the infinitesimal variance o*?(x) can be shown to be given by 


u*(x) 


a*2(x) = a(x) = x. 
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It remains to compute the new killing rate k*(x). We have 
Pr{Y* killed in (t, t + h)| Y*(t) = x} 
= Pr{Y killed in (t, t + h)| Y(t) = x, Y killed eventually} 


_ K(x)h + olh) 
E v(x) 


3 


and hence k*(x) = k(x)/v(x) = x?/2v(x). The conditioned process Y*(t) is now a 
diffusion process with killing on the interval (0, 00), with an entrance boundary 
at 0 (check this). 

In order to evaluate functionals of the diffusions Y and Y%*, it is useful to 
compute the relevant Green functions (cf. Section 3). 

Paraphrasing the constructions of Section 3 we determine a positive 
solution p,(x) of (10.1) vanishing at infinity and p,(x) a second positive solution 
vanishing at zero. Clearly, p,(x) = A(x) and p(x) = B(x) — BAX) qualify. 
Moreover, p,(x) is decreasing, while p(x) is increasing. Since o7(x) = x for the 
unconditioned process Y(t), we can expect that 


mao LBW) = V340)) eee 
G(x, y) = y i (10.2) 
EZ — /3A(x)] a yx 


The derivation of (10.2) can be justified by confining attention to the state 
space [0, r], r large, and treating 0 and r as absorbing barriers. Let G,(x, y) be 
the associated Green function on [0,r]. Sending r to infinity we find that 
G,(x, y) converges to G(x, y) as set forth in (10.2). 

The Green functions for the conditioned process Y*(t) in the spirit of 
Section 9 is given by 


ey coy) 
(x) 


The mean time to detection, conditional on detection occurring, is given by 


M*(x) = |, G4(x, y) dy 


_ 2nAQ)AQ) [* BY) — V340) |: a 49) P 


AO) = A) Jo y A(0) 
|, 2nAOLB(x) = V340] [* AO) [ - ao] iy 
AO) — Ax) J y LAO] 


where we have used the fact that u(x) = | — A(x)/A(0). 
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For small values of x, M*(x) is approximately constant (M*(x) = C), from 
which we deduce the following. In populations of size N, the mean time to 
detection starting from one heterozygote Aa in the initial generation (con- 
ditional on detection occurring) is of the order of N+? generations. This result 
stands in marked contrast to the conditional result for the random drift model 
derived in (9.10) and (9.12). 

Finally, we will derive the distribution of the maximum functional for the 
conditioned process. That is, we find the probability w(x) = w(x; y) that 
starting from Y*(0) = x, the maximum will exceed y (y > x) before detection. 
So 


w(x) = w(x; y) = Pr} max Y*(s) > y| Y*(0) = xt, y>x. 


O0<s<o 


Thus w(x) is just the probability that Y* ever gets above y before detection, 
and is therefore the solution of the equation 


x v(x), x?w(x) 


~w"(x) + xy WO) - Fe = 


satisfying w(y) = 1, w(0) < œ. Canceling common factors and making the 
substitution w(x) = n(x)/v(x) leads to the following equation for n(x): 

n(x) — xn(x) = 0. 
Two linearly independent solutions of this equation are A(x) and 


[B(x) — ,/3A(x)]. Hence, 


_ c 40) o (BR) = V340] 
w(x) = EE +C, ox) ; 


Since v(x) > 0 as x > 0, and we require w(0) < œ, it follows that Cy = 0, and 
the appropriate solution is 


B(x) — /3A(X)_A(0) — AG) 
Bly) — /3A(y) 4) — A(x) ` 


The mean maximum number of heterozygotes L*(x) is then given by 


W(x) = w(x; y) = 


L*(x) = {Pe max Y*(s) > y| Y*(0) = x} dy 


0 O<s<oa 
=x +| w(x; y) dy. 
y 


The examples given in this section highlight the usefulness of diffusion 
processes with killing in the analysis of Markov chains. 
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11: Semigroup Formulation of Continuous Time Markov Processes} 


Let {X(t), t > 0} be a regular time-homogeneous diffusion process on the open 
interval I = (l, r). We designate by P(t, x, y) = Pr{X(t) < y| X(0) = x} the tran- 
sition distribution function of X(t) subject to the initial distribution 

1, if x<y 


0, if x>y, Gay) 


P(0, x, y) = | 
i.e., a point distribution concentrating at x. We will assume for ease of 
exposition that P(t, x, y) derives from a continuous density on (l, r), namely 
dP(t, x, y) 
dy 


We now consider the family of operators {U,,t > 0} transforming each 
bounded continuous function f on (l, r) into the function U,f by the formula 


(USO) = EAX] = ELS(X()|X(0) = x]. (11.3) 
In terms of (11.2), it is useful to display (11.3) explicitly as 


=p(t,x,y) for t>0. (11.2) 


(U,f\(x) = | p(t, x, n)f(n) dn. (11.4) 
l 
Note that U,f is well defined for any bounded measurable function, We shall 
assume throughout this section that U, preserves continuity, in that 
(U, f Xx) is jointly continuous with respect to t > O and x in (l, r) (11.5) 


provided f is piecewise continuous and bounded on (l, r). 


SEMIGROUP STRUCTURE 
The operators U, enjoy the semigroup property 
U, +s = U,U, forall t,s >O, (11.6) 
in the sense that for every bounded piecewise-continuous f, we have 
U, +f = U{Uf) = ULU, f). (11.7) 


Verification of (11.7) rests on the Markov nature of the process as now 
indicated. Consider 


(Uis f) = ELIX + 5))| XO) = x] 
= E{E[S(X(¢ + s)| X0] X0) = x}. i 


T Sections 11 13 that follow are more advanced and use several basic facts and devices of real 
variable and measure theory, e.g., standard properties of Lebesgue integrals. Sections 14 and 15 do 
not depend on these sections, i i 


286 15. DIFFUSION PROCESSES 


For the moment, let g = U,f, so that 


glz) = EL f(X(s))| XO) = z] 
= E[f(X(t+ s)|X() =z] (by time homogeneity), 


and 
(X) = EL f(X(t + s))| XW]. 
Then 


(U,4sf\(x) = Elg(X()| X(0) = x] 
= U,g(x) 
=(U(U,f)\(x), since g= U,f. 


The identity in (11.7) is variously referred to as the Fokker—Planck equation or 
the Chapman—Kolmogorov equation. In terms of the transition densities the 
Chapman-—Kolmogorov equation becomes 


p(t + s, x, y) = [0 x, z)p(s, Z, y) dz. È (11.8) 
l 


The semigroup property (11.7) underlies a broad spectrum of theory and 
applications of dynamical systems, e.g., the reference Hille and Phillipst 
describes semigroup constructions pertinent to heat dissipation, wave pro- 
pagation, biological population evolution, functional analysis for summability 
methods, approximation theory, and eigenfunction representations. For our 
purposes and to provide perspective, it is instructive to characterize the nature 
of a linear transformation semigroup in the finite-dimensional case, where the 
key concepts are more accessible. Later in this and the following section we 
shall elaborate on the semigroup approach to the study of general Markov 
processes. 

Recall that every nonconstant real valued function U(t) satisfying 
U(t + s) = U(t)U(s) for t,s > 0, and |U(t)| < 1 over t > 0 is necessarily of the 
form U(t) = e“ for some negative constant A. When U(t) is continuous at 
t=0, so that lim,,) U(t)=1, then A is finite, U(t) is differentiable, and 
A = dU(t)/dt|,-9. For obvious reasons, the coefficient A is called the in- 
finitesimal generator of the semigroup. Where t > 0 we have 


du(t) _ _ 
ee AU(t) = U(t)A. 


+E. Hille and R. S. Phillips, “Functional Analysis and Semi-Groups,” Colloquium 
Publications Series, Vol. 31, Amer. Math, Soc., Providence, Rhode Island, 1974, 
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In a similar manner, a nonsingular continuous semigroup of finite matrices 
{U(t), t > 0} (U,,, = U,U, here signifies matrix multiplication) can be repre- 
sented in the form 


U(t) = e“, 
where the infinitesimal matrix A is obtained by differentiation: 
U(h)— I 
A sm ODE (11.9) 
h} 0 


(Compare to Theorems 2.1 and 2.2, Chapter 14.) 
Laplace transforms provide another method to ascertain A. In the one- 
dimensional case we define 


R, -Í e *U(t) dt, A>0. 
0 
Then 


R, =| e` “e^ dt -Í e FAM dt = (4 — A)`}, 
o o 


so that 
A—A=Rj!, and A=A-—R;? forany A> 0. 


The similar result AJ — A = Rj’ holds in the matrix case, where Rj‘ is the 
matrix inverse to R, and J is the identity matrix. 

The theory of semigroups of operators takes these finite-dimensional 
results of real analysis and generalizes them to function valued operators of 
functions. The development is a powerful tool for the study of continuous time 
Markov processes, as will emerge later in this section. 

Returning to the diffusion process with U, given in (11.3), we define the 
associated resolvent operators R, of the process by 


(Raf Xx) = [erus \(x)dt, A>O, (11.10) 
0 


where, as before, f is a bounded piecewise continuous function on (I, r). 
By interchanging the order of integration, we secure the resolvent as a 
kernel operator 


RIGE | "Gx, WFO) dy, (11.11) 


where 
G(x, y) = | e~ “Y(t, x, y) dt. (11.12) 
0 


It is often feasible to permit 4 = 0 in (11.12). In this case the kernel G(x, y) ‘is 
intimately related to appropriate Green functions of Section 3. 

By analogy with (11.9), the infinitesimal operator of {U,, t = 0} is defined as 
the derivative at the origin of the semigroup, defined in a suitable function 
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space sense. Observe that U,— I, the identity operator, as t|0, since the 
transition density concentrates at X(0) when t | 0 (see (11.1)). We define A as 
the appropriate limit of the operator sequence (1/h)(U,, — I): 


lim ee ‘| = A. (11.13) 


The characterization of the domain of A is a delicate matter, and the 
meaning ascribed to the operator representation U, = e“' requires care. 
These issues, and some applications are developed later. 


Example 1. Standard Brownian Motion. For any f (x) bounded and continuous 
on (— œ, 00), we have 


(UA) = = i e7&-»}12 f(y) dy. (11.14) 


The image function U,f is also continuous on (— 00, 0). 
The corresponding resolvent kernel has the form 


G(x, y) = eral, A> 0. (11.15) 


In order to verify (11.15) we need to establish the formula 


e7% dt = cil: evil, 


J= [ro : 
0 ./2nt 


The change of variable t = s? on the left gives 
J=./2/nx Lemans 
0 


Next, the substitutions c = |x|/./2A, B = Ix 1/2, and u = s/f c transform J 


into 
= veh e Blu Lay? du = z alaf 
0 o 1 
a emul 4 2 a) a 
o 


1 
Sem |"e =P? dy, where v=u—— 
u 


r 2|xl eza 1 = 
T 2y n/2A T? 


l pvz, 
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We will now show that for standard Brownian motion, the domain of the 
infinitesimal operator A includes the set of all bounded continuous functions f 
possessing second derivatives f” which are themselves bounded and continuous 
and vanishing as |x| —> œ. For such functions f, we also show that 


(Af (x) = $f'(x) for —o<x< œ. (11.16) 
Later we will provide an exact characterization of the infinitesimal oper- 


ator for Brownian motion. To achieve the formula (11.16), we begin by 


changing variables according to y =x +z/t in the formula (11.14) for 
(U, f(x) to obtain 


(U, f Xx) = F e772 f(x + z,/t) dz. 


1 
Next, expand f(x + z,/t) in a Taylor series in z,/t about the point x. Then 
fx + 2/0) = fx) + zif) + 42°") + RAZ), 
where Rz) is the remainder term and is of the order O(t?) for t small. We 


need the facts that f? ,, (1/,/2m)ze~*"? dz = 0 and f£, (z?/,/2n)e~*"? = 1. It is 
elementary to check that 


0 1 eT 
r = e 7 '*Riz) dz 
t e Dr t 


is order less than t; that is, r(t)/t + 0 as t + 0. These facts then entail 


A, f(x) = Lw, S (x) — f(x)} (A, defined by this equation) 


r 


= 3f"(x) + 7 


and hence 


(Af)(x) = nna SC) = 3f"), 


with the convergence uniform with respect to x, — 00 < x < œ. We have thus 
established (11.16) for the collection of f as prescribed. 


THE GENERAL SEMIGROUP FORMULATION OF CONTINUOUS TIME MARKOV PROCESSES 


Consider the state space as a locally compact metric space which we designate 
as S. The prototypic case takes S to be an open region such as a sphere in E” 
(Euclidean p-space), all of E”, or an orthant of E”. In one dimension S is 
usually an interval such as (l, r) or [/,r] where — œ <!<r< oo. When S is 
not compact we execute a standard one-point compactification by adding an 
infinite point œ whose neighborhoods are defined to be the complements of 
compact sets in S. The resulting compact space is denoted by S*. 
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For S = (l, r) on the real line we usually compactify S by adding l and r, 
thereby closing S to $* = [l, r]. i 

Let C(S*) denote the set of all continuous functions on S* where for f(x) in 
C(S*) the value at œ exists finitely. It is convenient to also adjoin to S* an 
isolated killing (cemetery) state “A” and prescribe f(A) = 0 for all f e C(S*). 

In the specific case of S = (l r) and S* = [l,r], the function class C(S*) 
consists of all continuous bounded functions obeying the conditions that 
lim, ,, f(x) and lim, |; f(x) exist, but are not necessarily the same. When r (or l) 
belong to S, then we assume f is continuous at these boundary points. 

We denote by C,(S*) = C,(S) the subset in C(S) of functions which tend to 
zero outside increasing compact sets. The collection of functions C(S*) is a 
linear space (i.e., if f and g are in C(S*), then also f + g are in C(S*)) endowed 
with the norm || f|| = sup,.s|f(x)| which obviously obeys the triangle in- 
equality || f+ gll < fll + lgl- 

We consider a strong Markov process {X(t),t > 0} (not necessarily a 
diffusion), with state space S*. We posit the existence of a transition prob- 
ability distribution 


P(t, x, A) = Pr{X(t) € A| X(0) = x} (11.17) 


defined for t > 0, xe S* and A traversing the usual subsets of S* (i.e., Borel 
sets of S* plus sets of probability zero). The possibility of a killing event 
passing the process to state A may also occur (cf. Section 1). The transition 
distribution is possibly only substochastic on S*, i.e., P(t, x, S*) < 1, but always 
stochastic on S* U {A} signifying 

P(t, x, S* O {A}) = 1. (11.18) 


The contingency of P(t, x, S*) < 1 is present when a killing event occurs with 
positive probability. 

To ease the exposition we shall stipulate the existence of a transition 
density p(t, x, y) for t > 0 with 


Plt A) = | tx sy for AcS and xes. 
A 


In almost all applications, p(t, x, y) is available. Unless stated otherwise we 
assume throughout that 


p(t, x, y) is jointly continuous in t > 0 and x, yin S. (11.19) 


THE SEMIGROUP OPERATORS 


We define the family of semigroup operators over C(S*) by 


Uf(x) = ELX] for t>0, xes, (11.20) 
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When f belongs to C,(S), also vanishing at A by convention, then (11.20) 
becomes 


Uf (x) = E,Cf(X())] = [re x, y) f(y) dy. (11.21) 
We assume the following properties of the semigroup operators U,, t > 0: 
U,: C(S*) > C(S*) (11.22a) 
(i.e., U,f is also a function in C(S*) if fis in C(S*)); 
U, f(x) is jointly continuous in t > 0 and x in S*; (11.22b) 
Uo =!1 (the identity operator); (11.22c) 


(this is the natural requirement that P(0+, x, A) is the point distribution 
concentrating at x) 

U,,, = U,U,, ts >Q. (11.22d) 
This semigroup identity is essentially tantamount to the Markov property 
(cf. (11.7) and (11.8)). 


IUS -fI = sap U,f(x) -f0 as t}0. (11.22e) 


This expresses strong continuity at t = 0. Sometimes a weak continuity 
postulate suffices so that U, f(x) > f(x) pointwise with ||U,f || uniformly boun- 
ded for fin C(S*) already implies the strong continuity of (11.22e): 


U,1 <1 (1 is the unit function on 5S). (11.22f) 
This condition holds because P is a probability measure on S* U {A}. 
U, maps nonnegative functions into nonnegative functions (11.22g) 


(because P is a nonnegative measure). 

If we define a norm on the operator ||U,|| = supyeciss, ip) <1 lUS || then 
(11.22f) and (11.22g) together imply ||U,/| < 1, i.e., U, is a contraction semi- 
group. In fact, 


1U, f(x)| < f v6. x, VISO) dy < If ll f pt x,y) dy < |f.  (11.22h) 


We say that a Markov semigroup U, has the (strong) Feller property if U, f (x) is 
continuous on S* for any f(x) bounded and measurable (e.g., piecewise 
continuous) on S. 


Example 2. Standard Brownian Motion in E,, n Dimensions. (Notation: for 
x = (Xp: X ) € En, let ||x||? = Ex?.) In this case it is customary to take the 
one-point compactification S* = E, VU {oo} identifying C(S*) = C(E*) com- 
prised of all continuous functions on E, where lim),) sa /(x) has a single limit 
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independent of the direction in which ||x|| > oo. The transition density is 


EEEL wey EE A | 1 - y1?) 
p(t, x, y) ' zyl 2p 2 i | ape 5, Ix — yl? }- 


(11.23) 
The corresponding semigroup operators for fin C(S*) are 
1 1 
U, f(x -Í exp -7 Ix — vi?) dy. (11.24) 
Aa En (/2nt)" 2t 


Example 3. An Ornstein—-Uhlenbeck Process in n Dimensions. One version 
of an Ornstein—Uhlenbeck process in n dimensions is to take the variance 
coefficient constant, the same as that of Brownian motion, but to take the 
mean infinitesimal displacement to be that of a restoring force directly 
proportional to the distance from the origin. In the notation of (1.2”) and (1.3”) 
of Section 1, y 


E 1 if i=j 
al = — = — 2 . = i 
u(x) Ixil 2 x?, and cx) b if ix]. 


These parameters in one dimension reduce to the classical Ornstein- 
Uhlenbeck infinitesimal rates (cf. Example C, Section 2). 
A transition density function of the corresponding radial process is 


n-1,—y2 2 2),72t =t 
y" te —(x* + y°)e ERE xye 
p(t, X, y) = a z= e”) ep: TE ove ‘) ‘ hain = 


x = |x|], y= fyl- 


THE RESOLVENT OPERATOR 


The resolvent is defined as the Laplace transform of U,f, depending on the 
parameter À > 0: 


(Ra f)(x) = | eA, \(x) dt. (11.25) 
0 


Elaborating the integrand where f vanishes at oo and A yields 


(Ri f(x) = f e* | P(t, x, y) f(y) dy dt -( SOG Ax, y) dy 
S 


involving the extended Green kernel G,(x, y) = f? e7*p(t, x, y) dt. In view of 
|U;,|| <1, it is elementary to check that ||R,f|| < (1/A)| S|). 

We assume henceforth that R,f(x) is in C(S*) for any f bounded and 
measurable. This property is satisfied in most practical cases. 
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The following resolvent operator equation identity holds 
R, — R, = (u—AR,R,. (11.26) 
Proof. For any f in C(S*) we need to verify the equation 
(u — af saui e "U, f (x) is) a =| (e7* —e “JU, f(x) dt. 
0 0 0 


Because U, is a bounded linear operator, we can move U, to the inner integral 
yielding 


| “enuf | e-u, f(x) is) ies | “ew | e ULUS) as) dt 
0 o 0 o 
sf eaf eA (x) is) dt, 
o 0 


the last equation resulting by the semigroup property. Now an obvious change 
of variable, t = s + t, gives 


| (| e~U f(x) ar) dt, 
0 t 


and by interchanging orders of integration 


= [erupe fezem ir) dt 
0 0 


= gag |, rose = eT Ame] Jr 
a 0 


1 
= IR, Sx) ~ Rif]. 
ZH 
Combining these equations, the verification of (11.26) is achieved. W 


As a corollary of (11.26) we infer the important fact: 


Proposition 11.1 The range of the resolvent operator (i.e., the collection of all 
g(x) = R, f(x) for some f e C(S*)) is independent of À for 4 > 0. 


Proof. Consider g(x) = R,f(x) in the range of R,. Then with the aid of 
(11.26), we have 
g(x) = Ry f(x) = R LAX) + (u — ARF (x)] = Rh) 


with h(x) = f(x) + (u — a)R, f(x) so that g(x) is also in the range of R,. The 
argument is plainly symmetric in A and u. =H 
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Another basic corollary emanating from the resolvent equation follows. 
Proposition 11.2. R, f(x) =0 for some A > 0 entails f(x) = 0. 


Proof. From the resolvent equation, we find that R, f(x) =0 for all å > 0, 
i.e., [fe “(U,f)(x)dt =0 for all 4 >0 and x in S*. The uniqueness of the 
Laplace transform implies that U,f(x) =0 for all t>0 and x in S* since 
U, f(x) is continuous in ¢ and x (See (11.22b)). Now let t | 0 and by property 
(11.22e), i.e., strong continuity at zero, the conclusion that f(x) = 0 ensues. W 


THE INFINITESIMAL OPERATOR 
We develop several equivalent definitions. 
Definition 11.1. For every u(x) in 2, the range of R, for some'a>0O0 
(Proposition 11.1 tells us that 2 is independent of « > 0), we define 
Au =au— f with R,f =u. (11.27) 


This is meaningful since R,g = 0 entails g = 0 so the f of (11.27) is uniquely 
determined, and by virtue of the resolvent equation, we have 


au — f = Ppu—g for u=R,f =Rgg. (11.28) 
In fact, let w = (B — a)u — (g — f). Then the resolvent equation gives 
Rew = (B - a)Rp(RS) 5 Rg =f) 
=R,f — Ref — Reg + Ref 
=R,f — Rpg =u—u=0, 


and so w = 0, i.e., (11.28) follows. This implies that A is uniquely defined. 
The next definition is more intuitive and conforms more closely to the 
elementary treatment leading up to (11.9). 


Definition 11.2. We define Au =v provided that u(x) € C(S*) and the limit 
relation 


=0 (11.29) 


holds for some v € C(S*). 


Our next objective is to show that A and A are equivalent operators, 
meaning that they have the same domains of definition and produce the same 
transformation. We write 2(A) = domain(A) for the domain of an operator A. 
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Let u = R,f in domain(A). Consider 


HED = BS L io- nS euso 


(0) 


= 5 [etus — U,f(x)] dt (11.30) 
0 


and by repartitioning the limits of integration 


1 Ah Š — Àt e” $ <A 
= AG —1)| e *U,f(x) dt — ple U,f(x)dt. 
0 0 


Now let h decrease to zero. The first term converges uniformly to AR, f and 
the second converges uniformly to f on account of the fundamental theorem of 
calculus and the fact that ||U.f — f || > 0 as h | 0 where 0 < € < h. Thus, 
im [URN — Raf 
hO, h 


— (AR f) — f) = 0. 


In accordance with Definition 11.2, we have that u = R, f is in domain(A) and 
Au = Au — f, which shows that A is an extension of A (A>A) as the two 
operators coincide on the domain of definition of A. 

It remains to show that A is an extension of A, ADA. To this end, let u be 
in domain(A) and set f = au — Au (which is manifestly in C(S*)). Put v = R,f 
which we have just shown is in domain(A) (because ADA is already estab- 
lished) and Av = av — f = av — (au — Au). Therefore 


A(v — u) = av — u). 
From the definition of A, we have 
U,(v — u) = v — u + ho(v — u) + olh) = (1 + ha)(v — u) + olh). 


Since U, is a contraction operator, then |v — ull > ||U,(v — u)|| = 
(1 + haæ)ljv — ull + o(h) and canceling the common additive term, dividing by 
h, and then sending h | 0 establishes 0 > ||v — u|| and hence v = u. Therefore, 
u in 2(A) implies v in G(A) and therefore 2(A) < D(A). Therefore, their 
domains coincide and A = Aasclaimed. W 


SOME PROPERTIES OF THE INFINITESIMAL OPERATOR A 


(i) A is linear (this follows from the “derivative” definition (11.29) since 
the derivative is additive. (11.31) 
(ii) (A) is dense in C(S*). (The range of R, is dense. Check this.) i 
(iii) A is a closed operator. This means that when u, € D(A) and Au, = v, 
converge, |u, — ul +0, |v, — v| > 0 then ue 2(4) and Au = v. Equivalently 
the pairings {u, Au} for u traversing (A) is a closed set in the norm. 
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We prove this last statement. Since for « > 0 
Va = Au, = aU, — fas where Rafa = Um (11.32) 


and v, and u, separately converge in the norm of C(S*), we see from (11.32) 
that f, also converges in the norm of C(S*), say to f(x). It follows since R, is 
bounded that R,f, = u, passes into R,f = u which means that u is in the range 
of R, and consequently in the domain of A. The limit relation v = au —f 
shows that Au = v. 


(iv) The relation Au = au — f can be expressed concisely as follows. The 
operator «l — A (I = identity) is invertible as a bounded operator on 
domain(A) and (aI — A)~! = R, such that 


(al — A)" < . (11.33) 


(See p. 292.) There is an important converse to the above description of A en- 
dowed with the properties (i)-(iv), which we state without proof. 


Theorem 11.1 (Hille-Yosida). Let A be a linear operator with a closed dense 
domain in a Banach space, say C(S*). If for some « > 0, |\(aI — A)~+|| < 1/a, i.e., 
for each f in C(S*) there exists a unique u in G(A) such that ou — Au = f 
and |\u|| < (1/a)||f||, then there exists a strongly continuous contraction semi- 
group U, (property (11.22h)) with A as its infinitesimal operator. If R, = 
(aI — A)~! preserves positivity, then U, is a positivity preserving semigroup. 


Example. From (11.15) for Brownian motion on (— œ, 00) recall that 
á 1 
R, f(x) = | wT dy for f(x) C(S*), 
— 0 tod 


with S* = [— 00,00] the closed real line and continuity at oo means that 
lim),) f(x) = f() exists and exhibits the same value for x + +00 or — œ. 
Then 


rs- E- F o-s age 


>0 as |x|—> 0 (prove this), 


i.e., R, f(oo) = f(oo)/a. 
We next show that u(x) = R, f(x) is twice continuously differentiable over 
the real line. To this end, we write 


1 


Rf) = |" fo) FE 


et V2ay dy $ em |" o eo Vay dy. 
x 2a 
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Then by the fundamental theorem of calculus, 


BIC) _ saf fOe dy ma em |" ea dy 
and 
2 
wo = ELO L aa, fe) — 24%) 


are Clearly bounded and continuous on [— 00, co] because f(x) and R, f(x) are 
such functions. Moreover, as x >+00, u(x) = R, f(x) > f(00)/a as indicated 
earlier. The above considerations show that Au(x) = 4u’(x) = au(x) — f(x) and 
also 


u(x) 30 as |x| > 0. (11.34) 


We can now succinctly characterize the domain of A for Brownian motion 
with the help of (11.34). Accordingly, we define the operator 


Au = 4u"(x) (11.35) 


with domain consisting of all functions for which u, w and u” are continuous 
and bounded on [— œ, œ] and such that u’(x) > 0 as |x| > œ. 


Claim. £ = A, the infinitesimal operator of Brownian motion. 


We pointed out in (11.16) that if u(x) and u(x) are continuous and 
bounded with u” vanishing at oo then u e YA). Thus, . is an extension of A. 
Consider now ve B(x). Define f(x) = av(x) — 4v"(x) with a>0. Then 
f(œ) = av(oo) and fe C(S*) (S* = [—00, 00] (+œ and —oo identified). Let 
w=R,f in BA) c Ae). Note that (w — v) = A(w — v) and therefore 
az(x) — 4z"(x) =0 with z=w-—v. The general solution of this differential 
equation is z(x) = c,e~¥?* + c,e*¥?**. This is supposed to be bounded for all 
real x which compels c; = c, = 0 and z(x) = 0. Therefore, w = v and v belongs 
to GA). Thus, (A) = B.A) and A = æ as asserted. 


FIRST DYNKIN FORMULA 


We present an ancillary formula before proceeding to the main result. Let 
u = R, f and let ø be a Markov time. Then 


u(x) = al | ef) ar| + E,[e~*"u(X(o))]. (11.36) 
yO E ' 
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ux) = [euno TE [eera dt 


= e| [exe dt + [ex a| 
0 


o 


= e| Fero a + B e [eefa + t)) a 


o (0 


and using the strong Markov property 


= e| [er (X9) a| + E,le"*u(X(o))]. E 
(0 


DYNKIN FORMULA WITH INFINITESIMAL GENERATOR 


We now present the main Dynkin formula which we shall refer to as the 
Dynkin formula to distinguish it from the first formula (11.36). 


Theorem 11.2. Assume o is a Markov time with finite expectation and u(x) is in 
D(A). Then 


al f "AUX (O) ar| = E,[u(X(o))] — u(x). (11.37) 
0 


Remark. We will present two proofs of (11.37). The immediate one falls back 
on the first Dynkin formula (11.36). The second proof presented in Section 12 
involves the theory of additive processes. 


Proof. Let u = R, f for some f e C(S*), which exists because by stipulation 
ue BA). Also, Au = qu — f. The first Dynkin formula provides 


u(x) = ral | eu f(X) a| + E [eu X(0))] 
0 


= B| [etaxo — Au(X(t))} a + E,[e-*u(X(o))]. (11.38) 
0 


cal | “e-u(X(2)) a| 
0 


because E,[o] < œ by hypothesis. Also, e~*, 1 < a, goes to Las a | 0 because o 
is finite for almost every sample path. We may apply the bounded convergence 


Now 


< alul E [o] > 0 as a70 
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theorem when « > 0 to get E [e7 *°u(X(o))] > E,[u(X(o))]. Therefore, the right 
side of (11.38) converges to 


-E| | * Au(X(0) a + E [UX]. (11.39) 
0 


Rearranging terms in (11.38) and (11.39) leads to (11.37). E 


APPLICATIONS OF THE DYNKIN FORMULA 


Consider standard Brownian motion X(t) = (X,(0),..., X,(t)),t > Oinndimen- 
sions. Starting at x, and writing ||x|| = )7_, x? <1, let ø be the first time the 
process departs the unit sphere, i.e., o is the first passage time to the set 
\|x|| > 1. This is a Markov time (why? cf. Chapter 6). 

Our task here is to calculate E,[o]. [We shall later, in (12.25), determine 
the variance of o.] 

Paraphrasing the analysis of (11.16) we readily prove that the domain of 
the infinitesimal operator A of n-dimensional Brownian motion includes at 
least those functions w(x) having continuous second derivatives for which w(x) 
and Aw(x) = )'?_, 0w/dx? (A is the Laplacian differential operator) converge 
to zero as ||x|| > œ. Then 


Aw = +Aw. (11.40) 
Now define 


u(x) = 1 — |x|? for ||x|| < 1 and extended to ||x|| > 1 to be twice 
continuously differentiable and in C,(E,). (11.41) 


Apply (11.40) to u observing by continuity of paths that ||X(o)|| = 1 and thus 
u(X(c)) = 0. Moreover, for t <o, u(X(t)) =1—||X(||? and at these times 
Au(X(t)) = —n. We do not know a priori that E,[o] < œ so we cannot directly 
apply Dynkin’s formula (11.37). The following procedure (via truncation) to 
overcome this problem is a common device. Specifically, we define a sequence 
of approximating Markov times. For each positive integer N, let 
Oy =o ^ N = min(o, N). Plainly oy is bounded and E [oy] < N < œ. Since 
with probability one every sample path escapes ||x|| <1, oy increases to ø as 
N Î œ. Applying the Dynkin formula (11.37) to oy and u of (11.41), we obtain 


E,[u(X(oy))] — u(x) = al f MuX) ar| =—nE,foy]. (11.42) 
0 


So E [oy] < (2/n)|u||. By monotone convergence, oy increases, and we deduce 
that E,[o] < (2/n)||ul|. Now we can apply (11.37) with o as prescribed and since 
u(X(a)) = 0, the Dynkin formula reduces to u(x) = nE,[o] or 


Cee 
(11.43) 


300 15. DIFFUSION PROCESSES 


DYNKIN REPRESENTATION OF THE INFINITESIMAL OPERATOR OF A DIFFUSION 
Consider a diffusion on S. A state a is called a trap point (or absorbing point) if 
Pr{X(t) = a|X(0) =a} =1 forall t> 0. (11.44) 
For any open subset V of the state space we let 


o(V°) = first passage time into the complement of V. (11.45) 


We will establish at the close of this section that with x in V, then 
E,[o(V°)] < co for most diffusions when V is a bounded open sphere. In fact, 
the random variable o(V°) has all moments finite, and its distribution function 
tails off exponentially fast. 

With these preliminaries in mind, we have 


Theorem 11.3. Let X(t) be a diffusion on S. If a is not a trap state, then for u in 
D(A) 


O (BaluX(o(V2))] = ula) 
E pe ( ACU } pee) 
Bion 2s 


Here the designation “V nbd” is an abbreviation for a bounded open region 
containing a neighborhood of a. Usually, we will take V to be a small spherical 
neighborhood of a. The symbol V | a refers to a collection of neighborhoods 
shrinking to a. For V a family of spheres with center at a then V | a means that 
the radius of V reduces to zero. 


The proof of (11.46) needs the following Lemma. 


Lemma 11.1. If a is not a trap state then there exists a bounded neighborhood 
V of a such that 
E,[o(V°)] < œ. 


We defer the proof of this lemma until after the proof of the theorem. 


Proof of Theorem 11.3. Let V be a neighborhood of a for which E,[o(V‘)] < 
œ. Obviously, if V > W > a then o(W*) < o(V*) (in order to depart from V it 
is necessary to depart from W earlier) and therefore E,[o(W°)] < E,[o(V°)]. 
By Dynkin’s formula 


a(V°) 
E,Lu(X(o(V*)))] — ula) — Au(a)E,Lo(V*)] = ed | {Au(X(t)) — Au(a)} a|, 
(11.47) 
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Because Au(-) in C(S*) is continuous at a in S, then for any prescribed € > 0, we 
can choose a neighborhood W of a small enough and contained in V such that 


|Au(x) — Au(a)| < € forall xeW. (11.48) 


This gives for W replacing V in (11.47) 


Ala Au(X(t)) — Au(a)| a < eE,[o(W*)]. (11.49) 
0 


Combining (11.49) in (11.47) produces 


E,[u(x(o(W*)))] — ula) 
E,[o(W*)] 


— Au(a)| < e, 


and this inequality persists for any contraction of W about a. Since e is 
arbitrary, equality holds in the limit, i.e., (11.46) ensues. E 


In order to prove Lemma 11.1 we develop some further characterizations 
of trap points. 


Lemma 11.2. If a is not a trap point, then there exists u(x) in the domain of A 
such that Au(a)# 0. 


Proof. Consider the collection of functions u = R,f, which span the domain 
of A as f traverses C(S*), « > 0. Suppose to the contrary that Au(a) = 0 for all 
such u. Then Au(a) = au(a) — f(a) = 0 for all f € C(S*) with u = R,f for all a. 
In particular 


| e “TU, f(a) -—f(a]dt =0 forall a>0. 
0 

By the uniqueness of the Laplace transform, since U,f(a) — f(a) is continuous 
in t we deduce that 


U, f(a) — f(a) = 0 forall t>0O and feC(S*), aes, 


that is, E,[ f(X(0)] = f(a) for all fe C(S*) and t > 0. Now a is not a trap point 
implies that X(t) is not near a for some t. Take f to be smooth in C(S*)) con- 
centrated away from a, i.e., f(x) is positive for some x # abut f(a) = 0. Then for 
this /, we have 0 # E,[f(X()] = f(a) = 0 which is absurd. To avert this con- 
tradiction the statement of Lemma 11.2 necessarily prevails. li 


Proof of Lemma 11.1. By Lemma 11.2, there exists ue (A) such that 
Aula) #0, say Aula) 28>0. Let V be a neighborhood of a on which 
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Au(x) > ¢/2. Define o(V°) as in (11.45) and let oy = o(V°) a N be the cor- 
responding truncated Markov time. We apply Dynkin’s formula with oy 
yielding 


E,[u(oy)] — ula) = al | ™ AXW) a 
0 
Therefore 


where > 0 is independent of N. Letting Nt œ entails oy to(V°) and the 
monotone convergence theorem implies 


E,[o(V] < ant < œ. 


The proof of Lemma 11.1 is complete. W 


BOUND ON EXPECTED TIME TO EXIT AN OPEN SET 


If xo is not a trap state, there exists a neighborhood U, around x, and a time 
to such that P(to, xo, U0) = p >0 for some fp. We may without loss of 
generality replace Uo with a sphere U centered at xo contained in U, such that 
U c Uo. (Specifically, take the radius of U equal to one-half the minimum 
distance from x, to the boundary of U.) Then 


P(to, Xo, U°) > B. 


Let r equal the radius of U and consider the sequence of functions 


0, xeU 
fx) = 4 n[lx — xol -rl  rsix—xol<sr+im  x¢U 
1, lx — xol >r + 1/n, x¢U. 


By assumption (11.22a), U, maps continuous bounded functions into continuous 
bounded functions (for t > 0) so that 


| Plto, X, Y) f(y) dy = p(x) 


is a continuous function of x and in particular the set {x|,(x) > a} is an open 
set. Also (J2; {x|@,(x) > a} is an open set as the union of open sets. But f, 
converges to the indicator function f* of U® so @,(x) + P(to, x, U°) as a mono- 
tone increasing sequence of functions. Hence, for each « > 0, {x| P(t), x, U®) 
> a} is an open set. i 
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Consider V = {x| P(t, x, U°) > B/2}. This is an open set containing x,. Let 
V=V n U which is an open set containing xo. Let 6x (V°) designate the first 
time of attaining the boundary of V from xy. Now V c U entails V° > U* so it 
follows that P(t), x, U°) < P(t, x, V°) for all x in V. Therefore, for all x in V, 


Pr{o(V) > to|X(0) = x} < Plto, x, 7) < Plt, x, U) 


=1-Plo,x, 09 <1- É. 


By the strong Markov property for x€ V, 
Pr{o(V°) > 2to| X(0) = x} = Pr{X(s) € V for all s < 2to} 


z | _Pr{o(V*) > tol X(t) = 2) 


V 


x Pr{X(to) = z, X(s)e V for alls < to|X(0) = x} dz 


since both factors are less than 1 — £/2 for any state x € V. By induction, 


Pr{a(V°) > nto| X(0) = x} < (1 — Ey forall xeV, (11.50) 


and this inequality easily provides a bound on all moments of the exit time 
from V. 

On account of (11.50) it is easy to deduce that the distribution function of 
o(V°) decreases exponentially fast. In particular, for a regular diffusion the first 
passage random variable o(V°) has all finite moments. 


Goa SIR OF THE INFINITESIMAL OPERATOR A WITH THE DIFFERENTIAL OPERATOR L 

In what follows, let {X(t), t > 0} be a regular diffusion process in natural scale 

(S(x) =x), with state space I=[l,r], and speed measure density 

m(x) = [7(x)]~! continuous and positive on I? = (I, r). Let A be the (strong) 

infinitesimal generator of the process (See Definition 11.2) and let ue D(A). 
By Theorem 11.3, we know that for a € 1°, 


ge E[UXOVD] — ula) 
ee a ELV] 


Eala (V £)] < œ 


(11.51) 


exists, and equals Au(a). Scrutiny of the proof of (11.46) reveals that for each 
compact subinterval J of 1° the convergence in (11.46) is uniform over ae J, 
where we take in (11.51) V = (a = 6a +.6) contracting to a. 
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We will assume for the moment that Au(a) > 0 for all a; we treat the other 
cases later. i 

Now since X(o(V°)) = a — £ or a + e, each with probability 4, and since the 
scale measure is x, we have E,[u(X(o(V°))] = 4u(a + £) + 4u(a — e). Further, 
E,[o(V°)] = f2+£ G*(a, m(£) dé, where G*(a, é) is given by (3.30) with S(x) = x. 


Hence we can write 


eae ae) | 
Auta hia = — : 
A >Í GMa, Eml) dé 


a~e 


(11.52) 


Recall from Remark 3.2, p. 197, that the denominator converges uniformly 
to m(a) > 0 as e | 0 for a restricted to a compact subinterval of I°. 

Since the convergence in (11.52) is uniform over a compact region of I°, we 
can choose 6 so small that the numerator is positive for all 0 < e < 6. That is, 


tu(a + e) +4u(a—e)>u(a) foral O<e<6 and ainJ cI. 


This implies that u(a) is locally convex and therefore convex. A convex con- 
tinuous function possesses left continuous left-hand derivatives and right 
continuous right-hand derivatives.t But from (11.52), these left- and right-hand 
derivatives must be equal, and so u'(x) is continuous. Further, the limit l 


v(a) = lim 5 [u(a + e) — 2u(a) + u(a — 8)] (11.53) 
came) 


exists. Since u(x) is convex, u”(x) exists almost everywhere, and whereever u(x) 
exists the fact of (11.53) and a simple two-term Taylor expansion shows that 
u"(x) = v(x). 

Finally, v(x) is continuous [since v(x) = 2m(x)Au(x), and both mand Au are 
continuous] and so bounded on closed bounded intervals of I°. So u”(a) is 
bounded almost everywhere and hence u'(a) is absolutely continuous. 
Accordingly, we can write i 


| u"(¢) dg = u'(y) — u(x). 


Hence, 


[a dt = +. fro Fedo ed 


y= jx y=x Jx y-x 
As y > x, the left-hand term goes to v(x), entailing that 
i eo = ao 
imta 
y>x (y = x) 
exists, and equals v(x). That is u”(x) = v(x) for all x. 


+ For properties of convex functions see G. H. Hardy, J. E. Littlewood, and G. J. Pólya, 
“Inequalities,” 2nd ed., Cambridge Univ, Press, London and New York, 1952. 
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So we have shown that if u is in (A) and Au(a) > 0 for all a, then 
Au(a) = 407(a)u"(a). (11.54) 


For any a such that Au(a) = 0, we know that since a is not a trap state, 
there exists u* in BA) such that Au*(a)>0. Take ū =u + Cu* so that 
Aii(a) > 0. Then ii is twice differentiable at a, and since Cu* is, so then is u. 
The case Au(a) < 0 follows in a similar way. We have now shown that u € GA) 
implies 


Au(a) = Lu(a) = $07(a)u"(a). (11.55) 


The result (11.55) extends easily to the case including an infinitesimal drift 
term p(x) smooth over I. In fact, we can reduce considerations to a natural 
scale by passing to the process S(X(t)) = Y(t) (cf. Remark 3.1). In this we 
achieve the identification: If w ¢ BA) then w’(x) is continuous on I? and 


» Aw(x) = 407(x)w"(x) + u(x)w'(x). (11.56) 


12: Further Topics in the Semigroup Theory of 
Markov Processes and Applications to Diffusions 


We deal with five topics in this section. (I) Some characterizations of boundary 
behavior for one dimensional diffusions in terms of the infinitesimal operator 
evaluated at the boundary. (II) Ramifications of the Dynkin formula (11.37) 
using the theory of additive processes and associated martingales. (III) The 
representation of processes with killing using multiplicative functionals. (IV) 
Some aspects of local time and inverse local time with applications to 
generalized Bessel diffusion processes. (V) The construction of some classes of 
space-time martingales and examples in diffusions. 


A. SOME CHARACTERIZATIONS OF BOUNDARY BEHAVIOR FOR DIFFUSIONS BY THE NATURE OF 
THE DEFINITION OF THE INFINITESIMAL OPERATOR 

The following discussion supplements the description of boundary behavior 

given in Sections 6 and 7. The emphasis here concerns the relationship of the 

Feller boundary classifications, Table 6.2, and the nature of the definition of 

the infinitesimal operator on or near the boundary. 


Theorem 12.1. Suppose the right end point r is an exit (or regular and absorbing) 
boundary, and u(x) belongs to domain A. Then 


lim Au(x) = 0. (12.1) 


xr . 
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Proof. For ue GA) = domain A, u = R,f, 


u(x) = e| [ero ar| = aji 4 E | 
o 0 Tes 


where T,_ is the first passage (hitting) time of the boundary r (cf. Remark 7.1). 
Consider first i 


F p Eo —aTy,- 
e| e-*f( X(t) a| < =e] 


We have previously shown that lim, Ele *™-] = | (see Theorem 7.2) at an 
exit or regular absorbing boundary. Since an exit boundary is absorbing it 
follows that 


lim u(x) = lim a e~*f(X(t)) ar| = f(r)lim rJ] SI 
T xtr 


xtr xtr a 


ya 


Therefore lim,,,Au(x) = au(r) — f(r) = aL f(r)/o] — f(r) = 0 at an exit or 
regular absorbing boundary as asserted in (12.1). 


Remark 12.1. Theorem 12.1 emphasizes the importance of selecting the 
correct underlying function space when performing a particular evaluation in 
connection with a diffusion process. 

To illustrate, consider the problem of finding v(x) = E[T|X(0) = x], where 
T= T, A T, and {X(t)} is standard Brownian motion on [a,b] with botha 
and b regular and absorbing. As indicated in Section 3, v(x) solves the 
differential equation 4v"(x) = —1, a < x < b; v(a) = v(b) = 0. However, v is not 
in the domain of the infinitesimal operator A of the process, since were it to be 
so, then we would have 3v”(x) = —1 for all x in (a, b), contradicting Theorem 
12.1. 

The problem lies in the function space C[a, b] being too restrictive for the 
computation at hand. 


Theorem 12.2. Suppose r is an entrance boundary and ue GA). Then 
lim, ,,u(x) = u(r—) exists and lim,,,[(du/dS)(x)] = (du/dS)(r—) =0 (S is the 
scale measure). 


Proof. Let u=R,f. We know by Theorem 7.2 that for b<c<r, 
lim,;,lim,,,£,[e~*7"] = 1, where T, is the first time for reaching b. Now apply 
the first Dynkin formula (11.36) for the Markov time T, to get 


Th 
u(x) = ap eo“ f(X(0) a + E,[e7*"Ju(b). 
0 
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Note that 
To THE —aTy 
ap o-*f(X(0) a < Il (ae), 
0 a 
Hence, 
Tim u(x) < w(i — lim Ele") + lim(E,[e~*7?])u(b). 
xtr xtr xtr 


Now, since the left side is independent of b in sending bf r, we obtain 


lim u(x) < wij, — lim lim(E,e = | + i tie e=% wo |; 
xtr a btr xtr btr | xtr a2 ») 


Taking cognizance of the fact that lim,,, lim,,, E.[e~*7"] = 1 (Theorem 7.2) we 
have on the basis of (12.2) that lim,,,u(x) < lim,,,u(b) and hence the limit 
exists. : 

Recall that at an entrance boundary S[x, r) = œ and M[x, r) < oo. Since 
Au = 4(d/dM)(d/dS)u = «u — f we have 


1,. | du du 2 ‘ 
5 im] Sea — Tol = tim f (au — f) am|, 


Because u, f, and M are bounded, the right-hand limit and hence the left- 
hand limit exist. But if lim,,,du(x)/dS = y > 0; then du(x)/dS > y/2 for x 
sufficiently near r. The property S[x, r) = œ is then incompatible with the fact 
that u(x) is bounded since for x close to r, u(r) — u(x) = ia (du/dS) dS > 
(y/2) J. dS = œ. The only way to avert this contradiction is the conclusion 
lim,;, du(x)/dS = 0. The proof is complete. Mi 


Characterizing Domains of the Infinitesimal Generator 


In what follows, we take {X(t), t > 0} to be a regular diffusion on I = (l, r) with 
a?(x) > 0 and continuous on I. We first consider the case in which both 
boundaries are exit, so that J = [I,r]. Let u be a twice continuously differenti- 
able function with Lu(x) = 407(x)u"(x) + u(x)u'(x)€ Co(I), that is continuous 
and vanishing at / and r. We define the function f by —f = Lu — Au, for some 
A > 0, and set w = R, f. Since w e D(A), (11.56) shows in particular that w is 
twice continuously differentiable on J. Theorem 12.1 shows that Awe C)(J). 
Now consider z = u — w. Then 


Lz = L(u — w) = Au—f — Lw 
= hu — f — Aw = àu — f — (åw — f) (by 11.27) 


= Az, 
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We would like to show that z(x) = 0, so that w = u, and hence ue F(A). 
Since Lu € C,(J), and Lw(a) = Aw(a) > 0 as a > l or r, we see that Lz € C,(J), so 
that ze Co(I). Now suppose that z(x) has a local positive maximum at x, 
interior to I. Then z'(xo) = 0 and z”(xọ) < 0, and so Lz(xo) = 407(x9)z"(X9) < 0, 
whereas Az(xX9) > 0. This contradiction shows that z has no local positive 
maxima (and, similarly, no local negative minima). Hence, z =.0. Thus, we 
have shown that if {X(t), t > 0} has exit boundaries at l and r, and if = 
{u:ue C(I), Lue Co(D}, then 9 € D(A). 

As a second example, we assume the process is in natural scale on I = (I, r) 
but that both / and r are entrance boundaries. By Theorem 12.2, we know that 
ue YA) implies du(x)/dS € Co(I), and that ue C(I). 

Let 9 = {u: ue C(I, Lue C(I), du/dS € C,(I)}. As in the previous example, 
take ue J and define the function f by —f = Lu — Au, A> 0, and let w = R; f. 
Then, w € YA) and hence we C(I). Then if z = u — w, we deduce as previously 
that Lz = dz. Since both u and w are in C(1), z must also be and dz/dS is in C(I). 

If z is of one sign, then integrating Lz = (1/2m(x))(d/dx)(dz/dS(x)) = Az(x) 
produces 


a r- d [dz dz dz 
0# af 2(x)m(x) dx S 7 (3) = as! ) as!) = 0 


which is clearly absurd. 

Suppose next that z(/+) > 0 > z(r—). Let xo be the last zero of z(x) in T. 
Then z(x)< 0 for x9 <x<r— and [1/s(xo)] dz(xo)/dx = dz(x9)/dS < 
Integrating over (xo, r—) gives 


0> 24 | axm dx = Ft) — Z (4) = -Z o) = 0. 


The foregoing contradictions imply z(x)=0 and so ue QA). So if both 
boundaries are entrance then J c GA). 

Further consider the case in which both boundaries are natural. Then if 
Q* = {u: Lue Co(I)}, similar methods show that 2* = FA). 

For an interval I = (l, r) where l is an entrance boundary and r a natural 
boundary the preceding analysis applies to show that the functions of 

= {u: Lue C(I), du(l+)/dS = 0 and Lu(r—) = 0} belong to F(A). 


B. ADDITIVE FUNCTIONALS 


A more natural proof of the Dynkin formula (11.37) involves the concepts of 
additive functionals and associated martingales. We need the concept of a 
shifted sample path. Let œw designate the sample path wœ shifted by a time 
epoch t. In other words, for w we ignore the nature of œ up to time t and 
delimit œ," as the part of w commencing at time /. 
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Definition 12.1. A process C(t) = C(t, w), t > 0 is said to constitute an additive 
functional adapted to an increasing family of fields {F,} (see Section 8) if 


C(t, w) is measurable F, for each t (12.3) 
and with probability one, 
C(t + s,@) = C(t, w) + C(s,a,*)  forallt and s. (12.4) 


We will operate in the setting of Brownian motion B(t) as the underlying 
stochastic process on which all functionals are defined, although all analogous 
conclusions are also valid where B(t) is replaced by any strong Markov process. 


Examples of Additive Processes 


(i) Let f(x) be an integrable function taken over any finite segment of the 
real line. Then 


t 
C(t, œ) = | FS(Bls, w)) ds determines an additive functional. (12.5) 
0 


In fact 


C(t + s,@) = icc @))dt = fyc, w))dt + f `B, w)) dt. 
0 0 t 


By definition of œw and w,’, B(t, w) = B(t — t, œ; ) for all t > t. The change 
of variable t’ = t — t in the last integral gives 


[ ree w)) dt = [yee — t, œ; ))dīt = [ree wrdt. (12.6) 
t t 0 
The relation (12.4) for the functional (12.5) ensues by virtue of (12.6). 
(ii) 
C(t) = f(B(t) — f(B(O)) (12.7) 


is an additive functional adapted to {F¥,}. (Prove this.) 
(iii) The sum of additive functionals is an additive functional. 


The following results on additive functionals have wide applications. 
Lemma 12.1. A nonnegative additive functional C(t, œ) is increasing. 
Proof. Indeed, C(t + s, œ) = C(t, w) + C(s, œ) = C(t, œ). 


Lemma 12.2. An additive functional process C(t, w) satisfying 
EIC] <% and E[C()]=0 forall x andt >00 (12.8) 


is a martingale with respect to {F}. 


310 15. DIFFUSION PROCESSES 


Proof. We compute y 
E[C(t + S) F] = ELCHI F] + ELCH, oF] (12.9) 


the equation resulting from the postulate (12.4). 
By the Markov property 


E[C(t, o;)|F%] = Epo [C0] =0 with probability one, (12.10) 


the last equation by virtue of the hypothesis (12.8). By definition, C(s) is 
measurable F, (property (12.3)) entailing 


E[C(s)| F;] = C(s) (12.11) 
so that with the information of the foregoing discussion (12.9) reduces to 
E(C(t + s)| F] = C(s) (12.12) 


the desired martingale property. W 


The Dynkin Formula (11.37) and Additive Functionals 
Let u(x) be in BA). We form the process 


Y(t) = u(X(t) — u(X(0)) — | AUXE) dt, t>O0. (12.13) 
0 


We shall show that Y(t) is a martingale by showing that Y(t) is an additive 
functional and that E,[Y(t)] =0 so that Lemma 12.2 applies. To this end, 
observe first that Y(t) is an additive functional by virtue of the facts of (12.5) 
and (12.7). The pertinent sigma fields are those generated by the process X(t), 
i.e., F, = {the Borel field induced by the sample values of X(u), over the time 
span0<u< th}. 

The next string of equalities follows by passing the expectation across the 
integral sign, using the definition of the semigroup operator U,, employing the 
identity dU,/dt = U,A, and finally invoking the fundamental theorem of 
calculus. We compute 


E LYO] = E,LAX(d)] — us) — ral | AUX) dr 
0 


= U,u(x) — u(x) — f U,Au(x) dt 
0 


= U,u(x) — u(x) -f (4 Uo dt = 0. (12.14) 
o \dt 


The Dynkin formula follows by applying the optional sampling theorem 
(see Theorem 3.2, Chapter 6) to the process Y(t), that is, let ¢ be a stopping 
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time with finite expectation. Then 
E,LY(o)] = E,{u(X(o))] — u(x) — E, | f “Au(X(2)) i 
ce) 


=E,[Y¥()]=0 forallt (12.15) 


in agreement with the Dynkin formula (11.37). 


An Extension 
Let f(x) be in the GA) and let g(t) and g’(t) be bounded and continuous over 
0<t< oo. Then 


glt + IU, +f (x) — WOU FC) -| g'(s)U, f(x) ds 


+ | Gls) U AFC) ds. (12.16) 


The proof of (12.16) relies on the identity 


ORO Mp SACS 
dt 


valid for any f in BA) and straightforward integration by parts. The identity 
underlying the original Dynkin formula corresponds to the choice g(s) = 1. 
With (12.16) in hand and v(x) in GA) we form the process 


Z(t) = g(t)o(X(e)) — AXO) 


- frox (s)) ds — | ‘gs AU(X(9)) ds. (12.17) 
0 0 


Claim. Z(t) is a martingale with respect to the family of fields F, determined 
by the process realizations of {X(s)} up to time t. 


Proof. Observe that 
Z(t + t) — Z(t) = g(t + X(t + 1) — g(t)v(X(d) 


ttt 


+ | FOE (s)) ds + | g(s)Av(X(s)) ds. 


t 
The expectation conditioned on knowing X(t) up to time t (t < t) gives 
E[(Z(t + t) — Z(t))| A] = 0 (12.18) 


since the right-hand side under the conditioning in view of the strong Markov 
property merely reduces to the identity (12.15) for x = X(N). 
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The equation (12.18) affirms that {Z(t),t > 0} constitutes a martingale 
adapted to {¥,}. Since E,[Z(t)] =0, then for any Markov time o of finite 
expectation satisfying 


e| feo a| <0 and z| | ‘axon ki 
0 0 7 


the optional sampling theorem (Theorem 3.2 of Chapter 6) implies 


E,[Z(o)] = E,[Z(t)] = 0. (12.19) 
Some Specializations 
Suppose v, Av, A*v,..., A*v belong to the domain of A and suppose ø is a 
Markov time such that E,[o"] < oo. Then, 
k m-1 
S (—1) mia 
E,[u(X(o))] = (x) + F ml E,[o"A"(X(o))] 
m=1 $ 
+g AIAX (S) d 12.20 
g E | A AXO ds |, (12.20) 


In fact, the result of (12.19), E,.[Z(c)] = 0, for the case g(s) = s”, m > 1, gives 


E,[o"(X(o))] = mE.) [s= (s)) as| + B. ['sranx (s)) a|. (12.21) 
o 0 


We define 
(=I? J mam 
On = Es L A™*10(X(s)) ds |. 


and substitute A”v for v in (12.21) to get 

E,[o"A™v] = (—1)"m![a,, — am-1]- (12.22) 
This holds for m = 1, 2,...,k. The case k = 0 is the original Dynkin formula, 
that is 


E,[v(X(o))] = v(x) + r| [axo asl, (12.23) 
0 


= u(x) + do. 


If we add up (12.22) and (12.23) the identity (12.20) ensues. 


Application 


We will use these ideas to calculate E,[o*] where a is the first departure time 
from the unit sphere ||\x||? < 1 where the initial state is X(0) = x and X(t) is 
standard Brownian motion in E". Consult page 299 for the derivation of 
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E,[o] = (1 —||x||?)/n. A fully rigorous proof requires truncating o by 
oy =o ^ N and operating in terms of oy as we did in the analysis of (11.43). 
We leave this technical point to the reader and proceed as if E,[o7] < oo. 

We apply (12.20) for k = 1 to the function 


we i — Ixl? if Ix|<1 (12.24) 


very smooth if |x|>1 


with v(x) vanishing rapidly as ||x|| — œ. Recall from (11.40) that Av = Av = 
497; 07v/Ox? so that 


Av(x) = —2n(1 — |x|?) + 4x]? for Ixl <1 
and 
A? v(x) = (2n + 4)n for Ixl < 1. 


The equation (12.20) for k = 1 gives 
o? 
0 = (1 — ||x||?)? + E,[4e] — a + avin]. 


We already determined E,[o] = (1 — ||x||7)/n in (11.43). Hence 


4(1 — ||x||*) 7 (1 — Ixl? 
(2 + n)n? n(2+n) ` 


E,[o7] = (12.25) 


C. MULTIPLICATIVE FUNCTIONALS 


A positive multiplicative functional Z(t) of the process {X(t)} is such that 
log Z(t) is an additive process in the sense of Definition 12.1. Accordingly, 


exp| - | K(X (t)) ar (12.26) 


with k(x) bounded and continuous is a multiplicative functional (cf. (12.5)). In 
particular, the Kac functional (Section 5) reduces the contributions of X(t) by 
a multiplicative functional factor since k(x) is a nonnegative function and 
hence (12.26) operates as a killing rate. 

For a regular diffusion X(t) without killing the local behavior is governed 
by the drift and variance coefficients. The possibility of killing introduces a 
further infinitesimal rate 


l 
lim h Pr{the process killed during the time interval 


hjo 
(t,t + h)| X(t) = x} = k(x). (12.27) 
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Let the diffusion process with killing as in (12.27) with the same drift and 
variance coefficient as X(t) be denoted by Y(t). There is a simple procedure by 
which to realize the process Y(t) in terms of the sample paths of X(t) which we 
now describe. We will assume k(x) positive and continuous defined on S* such 
that 


| LG) eee (12.28) 
0 


for almost all sample paths of X(t). Let R be a positive valued random variable 
following an exponential distribution with parameter 1. We sample a value of 
R independently of X(t) and define for each sample path œw of X(t) the positive 
random variable 


i= inf fao dt > r}. (12.29) 
0 
Now, set 
o sx) for t <6, 
Y(t) = i io SE (12.30) 


(A = killing state). By reasoning paralleling the analysis in Section 3, it is easily 
shown that Y(t) is a diffusion involving the same infinitesimal mean and 
variance parameters as X(t) with killing rate k(x) of (12.27). 

We conclude this section with a more formal description of the Kav 


semigroup 
t 


Ul f(x) = H, f(x) = E. ex | q(X(s)) as |y at | (12.31) 


0 
incorporating the multiplicative functional exp[|p q(X(s)) ds] with rate function 
q(x) not necessarily negative on C(S*). 

The resolvent operator for (12.31) analogous to (11.25), 


X 


R® f(x) = | eH, f(x) dt (12.32) 


(0 


is well defined for « > |\q* || = sup, q(x). 


Theorem 12.3. Let A be the infinitesimal operator of U, corresponding to the 
process {X(t)}. Then the infinitesimal generator for the semi-group U™ is 
Ay = A+ q with domain G(Ay) = BA) and Ayu(x) = (Au)(x) + q(x)u(x). 


Proof. We first verify the semigroup property of U‘”. The additive process 
Q, = Oo) = fo q(X(t, @)) dt is certainly measurable with respect to the Borel 
fields ¥, of the X process. 


H, f(x) = E,[e@ f(X + sX] 
= E [ete f(X(s, wD], 
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where œ again refers to the sample path corresponding to œ subsequent to 
time t. By the strong Markov property 
H, +f) = E [e Exo lef X ()]] = Efe H.f(X(0)] 
= H(H, f Xx). 


Our next task concerns the determination of the infinitesimal operator of U. 
For « > ||q*|| we claim that 


REF = Rif + RYMGRS) = Raf + RARP). (12.33) 
To validate (12.33), set v = R fand u = R,f. Consider 


v(x) — u(x) = e| [eere — nar (12.34) 
0 


B| [ e- “f(X o( |eeax (9) is) ar| (by pathwise 
0 0 


integration) 


e| eĉsq(X o(| e “f(X(t)) ir) as| (interchanging 
9 i orders of integrals; 
see below for 


validation) 
= E| [eeqxoe a -“F(X(t, wo.) ar)| (change of 
0 0 variables) 


= B| [Maxine Eso | ero ir) as| 
0 0 


(Markov property) 
= aj e%q(X(s)e = uX(s)) as| = RY(qu) 
0 
yielding the first formula of (12.33). 


The justification for interchanging orders of integration rests on the 
existence of 


edf {i EFX e qX (s)| ds ar| 


) ) 


<Ill wie. | al ezela nasa | (ii = sup ats) 
0 0 x 


fet aie | 
s M al | e C leds e o, 
ü x : 


A 
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A similar analysis leads to 


v(x) — u(x) = B| [enone —1) ar| 


=E, [enx o(f eea (s)) as) a| 
0 0 


= [eo (eco ar) is 


o 


= an e~ “qX (sX (s)) as| = R,(qv) 


which is the second equation of (12.33). 
From the resolvent equation 


O= Ry — Rf + («— PRR, 4,8 > \a* 
we find that the domain of the generator of the Kac semigroup is the range | 
HAgp) = range of R“ independent of « > |\q*|| 
(cf. Section 11, pages 293-295) and 
Am = «u — f, where u= RËf  forsome fe C(S*). 
By virtue of (12.33), we have 
Raf = RPS — aR, f) in HA), RPF = Rf + qRPf) in DA) 


for « > |\q*||, indicating that P(A) = D(A). Suppose ue P(A) and set 
u=R,f, then u= RY? (f— qu) from (12.33) and by its determination 
Aunu = au — (f — qu) which reduces to au — f + qu = Au + qu. This com- 
pletes the proof of Theorem 12.3. W 


D. LOCAL TIME PROCESS 


We already encountered the existence of local time in Section 8. This is a 
powerful and important concept underlying the analysis of a broad spectrum 
of functionals of sample paths. Consider a regular diffusion on I = (l,r) in 
natural scale (cf. Remark 3.1), i.e., scale density s(č) = 1, and speed measure 
m(x) > 0 for x in I. The formal infinitesimal operator of the process is 


1 du 


u= T (12.35) 
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Under this normalization we determine the general local time process at the 
origin 
: fo I- »(X(s, o)) ds 
O(t) = O(t, œ) = lim =, 12.36 
Dee a “fe AOE oe 
where 


1, a, 
I-s, (é) = . : í 


is the indicator function of the open interval (—e, £). As usual, œ denotes a 
sample path of the X(t) process. 

The numerator of (12.36) measures the span of time from 0 to t that X(s) 
resides in I-a. The denominator is a measure of how fast the diffusion 
process clock runs when X(s) is located in (—e, €). The existence of the limit 
O(t, w) for almost every sample path w is a remarkable and recondite fact. 

Since fo S(X(s, «)) ds is an additive process for any nice function f as given 
in (12.5), the limit 6(t, œ) is also an additive process so that 


' A(t +s, œw) = O(t, w) + Os, œw) 
where œ denotes the sample path œ shifted by time t. Moreover, it is 
expected and correct that O(t, œ) is continuous in t for almost every œ. Plainly, 
O(t, w) is nondecreasing in t. Derivation of the fact that 0(t) is strictly increasing 
at those times t where X(t) = 0 is subtle. A local time process can be defined 
with respect to each state point (not only the origin) where 


4 fo La—ca+X(s, œ) ds 
Oale o) = mn Taa fE) dE 


€ļo 
We would expect, and it is correct but formidable to demonstrate rigorously, 
that 


| XG, w)) ds = | ‘ f(a)0,(t, œ) da. (12.37) 
0 =o 


This formula arises formally by a change of variables. On the right the level a 
is occupied about 0,(ż, œ) time units by the process excursion over the time 
period (0, t) so that f(a)0,(t, œ) da is the contribution to {4 f(X(s, @)) ds’: when 
X(s, w) hovers at the level a. 

When there are absorbing boundaries with T* the absorption time, then 


E,[0,(T*, œ)] = G(x, a), 
where G is the appropriate Green function (cf. page 198). 


Inverse Local Time Process 


Since O(t, œw) is increasing and continuous, the inverse function is well defined 
(see the discussion of random time changes in Section 8). We define 


Ot, a) = 05'i, w) = 07H), (12.38) 
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suppressing w when no ambiguity occurs. The process 0~ 1(t, œ) is referred to as 
inverse local time at the origin. By the nature of the construction of the inverse 
function, 6(v, œ) is necessarily strictly increasing at v = 07 1(s, œw), meaning that 
O(v + £, w) > O(v, œw) for every s > 0 since v = 07 '(s, w) = inf{v: A(v, w) > s}. 
Since (t, w) is continuous, we see that 


1 
6-1(t,@) <s is equivalent to (s -7 o) >t forsomen, (12.39) 


so that we can decide the validity of @~1(t,w)<s by knowing the process 
values {X(t)} up to time s. Therefore 07 '(t, œw) is a Markov time. 

We already remarked that the local time process (12.36) is defined as a 
limit of additive functionals so it is also an additive functional. This means 
that for v > u, 


O(v, w) = Ou, w) + Ov — u, wi ), (12.40) 
or letting 07 t(s) = v and 07 \(t) = u for t < s, then s = t + O(v — u, œw} ). Now, 
v— u = 07 !(s — t,œ}) because O(t, œ) is strictly increasing at t = v so that 
O(t, w7 ) is also strictly increasing at t = v — u. 

Thus, we have proved for s > t, that 


6-1(s, œ) — 07 tt, @) = 07s — t, Of- 10). (12,41) 


Using the foregoing identity and the strong Markov property, we have for 
O=to<t <- <t,, that 


api exp{— [07 (t) — oed | 
= £o TI exp{— (07 (t) — 07 (ti-1)} ExP{— 4,0 (ty — tn- 1 O0- Ht, D| 
and by conditional independence 
= za (TI exp{— (07 "(t) — 0t) [Eolea e, — ta-1)}], 
and inductively, 


= [] Eolexp{ -00716 — ti-1))]. (12.42) 


i=1 


Setting «x; = 0 (j # i), the above reduces to 


E [exp — a [OTt — 07 D] = Eolexp{ -a07 t t D 
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Definition 12.2. A process Y(t) = {W(t, œw), t = 0} is said to be a time homo- 
geneous Levy process if for every finite set of time points 0 = ty < ty < °- < tn, 
and all real «,,..., &,, then 


Eo 11 exp{—a,W(t; — n-a) | = Ed TI exp{—a(w(t,) — de. | 


= I Eolexp{—a,Wt; —t;-,)}]. (12.43) 


Lévy processes have been characterized as limits of compositions of inde- 
pendent families of compound Poisson processes and Brownian motion. (This 
topic is an important part of a more advanced course in stochastic analysis; 
we refer to Chapter 16 for further elaborations.) 

Combining (12.41) and (12.42), we have 


Eolexp{—a0" (s + t,o)}] = Eolexp{— a07 (s, œ)}]Eo[exp{— a07 (t, w)}] 


Thus, the function M(s, «) = E,)[exp{—«0~'(s)}] satisfies for each « > 0 the 
familiar functional equation 


M(s + t,a) = M(s, a) M(t, a), M(0, a) = 1. (12.44) 


The evaluation at t = 0 is valid because O(s, œ) is strictly increasing at s = 0 
provided X(0) = 0. We also know that M(s, æ) is confined between 0 and 1. It 
follows by the characterization of the exponential function (cf. Theorem 2.2, 
Chapter 4) that 


M(t, a) = e719 (12.45) 


for some nonnegative increasing function ¢(a) of « > 0. 
We concentrate henceforth in this section on the diffusion with state space 
I = (— œ, œ) characterized by the speed measure m and scale density s 


sf) =1, m= (y > 1). (12.46) 


‘The restriction y > —1 assures that m(é) is integrable on any bounded interval. 
‘The infinitesimal operator is A = (1/2|x|’)(d?/dx7). Essentially equivalent to 
(12.46) we consider the diffusion X,(=X,(4), t > 0) with 


W(x) = 0 and o7(x) = a = Pa 


(12.47) 
Notice that a?(x) is infinite (when y > 0) at x = 0, but this causes no problem 
since m is locally integrable and, therefore, O is a regular point (why?). Of 
course, Brownian motion corresponds to y = 0. 
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It is easy to check that the boundaries — oo and +œ are natural for each 
process X,. 


Invariance Properties of the Diffusion Process X (t) 


For typographical convenience, we suppress the index y when no ambiguities 
occur. For each sample path X(t) we construct a new path X(t) by changing 
the time scale and multiplying the state variable by a factor in the manner 


X(t) = “x X(te’*?), (12.48) 


where c > 0 is fixed. The mapping (12.48) of the sample paths transforms the 
probability measure P” on Q, associated with X, into a probability measure 
PO over the image paths ï, in Õ,. The probability calculation of a collection 

@ of sample paths in G, is that of the collection of sample paths in Q, which 
yield @ by the construction in (12.48). We claim that P is the probability 
measure of a regular diffusion X with scale and speed measure the same as 
those of X. In other words, P and P are equivalent measures. To see this, let T, 
(7,) be the first passage time to the point a for the process X (X). Let §, be the 
scale measure of X,, and observe that 


/ 


Six) = $a) (12.49) 


Pr{T, < T,,| X(0) = x} = Si -S@’ 


It is clear from the definition (12.48) that sample paths of X, starting from 
X,(0) = x for which Ñ, < Ñ, coincide with sample paths of X, starting from 
X(0) =cx for which Ta < T.a. Therefore, the right-hand side of (12.49) is 
exactly 


x — ca x—a 


c 
= Pr{Tue < Tel XO) = 0x) =La pLa 


= Pr{T, < T,|X(0) = x}. 
These equations prove that 5(x) = S(x). 
Examination of (12.48) also reveals that the time taken for X to leave the 


interval (a,b) with X(0) = x is the time taken for X to leave (ca, cb) starting 
from cx, stretched by a factor c7*), Therefore 


EAT A D) = c OORT ^ Tol (12.50) 
For the diffusion X, it follows from (12.49) that 
G¥ (x, č) = 2(x — a)\(b — Ob — a), x < &, 


is the Green function (3.30) with absorbing boundaries at a and b since the 
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process X „is on natural scale. We know by virtue of (3.13)-(3.15) that 


b 
ECT, a A] a Õ* (x, y)m(y) dy, (12.51) 


a 


where (y) is the speed measure density of the X process. We also know that 


cb 


Eel Tea A Ta] =| da eb(CX, Eé dé. 


The change of variables € to cy in (12.52) yields 


b 
E..[Tea ^ Ta] = aaf Ge, (Cx, cn)inl’ dn. 


a 


But since Gx, (cx, cn) = cG* (x, n), the last expression gives 


b 
Elea ^ To] = aral Gi olx, ninl” dn. (12.52) 


a 


Now (12.50) in light of (12.51) and (12.52) gives 


b 
BIT, a BJ = [ Gta AO de 


a 


= mA alias) OM I BF A Tey] 


b 
= paaga G* (x, min? dn 


a 


b 
| Gil, nln) dn. (12.53) 
As noted earlier in (12.49), G¥ (x, n) = G%,(x,n) applies to any choice of 
a<x<b. The identity (12.53) then implies that m(n) = |n|” = m(n). (Prove 
this.) Of course, if the scale and speed measures S(x) and M(x), respectively, of 
a regular diffusion are known, then the diffusion coefficients u(x) and v?(x) can 
be calculated from the formulas 


es 1 
| me) = MIC) = 07) 
and 
y *2u(¢ 
S'(x) = s(x) = apf- | a ash. 
See (3.6) and (3.7). 
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By the relationship (12.48), we see that the time spent in (—e, €) for the X 
process up to time tc’*? is the same as that spent for the corresponding 
process X in the interval (—¢/c, e/c) up to time t. When X(0) = X(0) = 0, these 
facts imply that for the associated local time processes, (t) is equivalent to 
@(tc’*?)/c. To establish this more formally, notice that 


o E ce cs LX(8)] ds 
Otet?) = lim Jo" Hessen 
Ce tee ede 


= lim oe fo I~ ce,ce)L X ueta] du 
210 fE lé dé 

m e+? fo L-e X(uc?*?)/c] du 

210 fE elél dé 

m Ce fo I-e [Xu +?)/c] du 

210 fect nl’ dn 


oon Jo Luo EOW] du 
ao fln? dn 


(substituting s = uc? +?) 


— 
= 


(putting ¢ = cn) 


II 
a 
D 


(by (12.48)) 


It follows that the inverse local time processes agree in the sense that 


0~'(s) is equivalent to c’**6-\(s/c), s>0, c>0. (12.54) 


Theorem 12.4. For the process X, of (12.48) we have 
Eole” O] =e, (12.55) 


where olx) = «t€ +2¢(1), and (1) > 0. 


Proof. Since X, and Š , are equivalent processes, we infer that 07! and 6-1 
are also equivalent processes. Referring to (12.54), we have 


ese = E,fexp{—«6~*(s)}] = Eolexp{— ace +20 (s/c)}] 
= Eolexp{—2c?*?0~*(s/c)}] = exp{—(s/c)e(ac?*)} 
valid for all s, c, and «æ positive. It follows that ọla) = (1/c)p(ac’*?) for all « 


and c positive. Now determine c such that «c?*? =1 or ce =a 1/0''?), Then 
ola) = a! 0t (1) and the proof of Theorem 12.4 is complete. W 
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Example. For Brownian motion (y = 0). 
Eole ©] =e-v**' foral «>0 (12.56) 


for some constant K. 

We recognize the right of (12.56) as the Laplace transform of the first 
passage time from 0 to a level t (see Chapter 7, page 362). Thus, the inverse local 
time sample paths at the origin for Brownian motion are realized as follows. For 
each level a > 0 on the positive axis, (a is to be viewed as time for the inverse 
local time process), we have 07 (a, œ) = T,, where T, is the first passage time 
from the origin to the point a. Clearly, from its meaning T,,, = T, + Ty(w7 ) 
which is the formula (12.40). Thus, the additive process with Laplace 
transform 


E,[e77?7] = e Kv _ E,[e~* ‘7 


is that of a one-sided stable process of order + (consult Chapter 7, page 362). 


THE PROBABILITY DISTRIBUTION OF THE LOCAL TIME @0(t) AND THE OCCUPATION 
TIME OF THE POSITIVE AXIS FOR THE DIFFUSION Xy OF ORDER Y. 


Let O(t) be the local time for the diffusion X,(t) of index y. Then, 


Theorem 12.5 


0 
prfu < Sa <ut au} = f,(u) du, (12.57) 
where 
V/y + 2)) 1 
= : = — 12. 
ITO + ora Pye 098) 
and f,(u) is the Mittag-Leffler density of index B, namely 
Ls CG kK)T(Bk kai 12.59 
Ju) = z -r en mBk)T(Bk + 1)u*~ t. (12.59) 


The name “Mittag-Leffler distribution” comes from the fact that the Laplace 
transform, M,(—x) of falu) has Milz) = È -o 2/P(Bk + 1), the Mittag- Leffler 
function of complex variables with index f. 
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Proof. Observe 


| é "E,[e 7 dt 


0 


= Ey | | erg ee ar| (this involves the expectation 
4 with initial position X,(0) = 0) 
1 1 2 
aot 5 Eo f eaten (integration by parts) 
0 


1 1 a -1 ; 
= ef [e7 “d(e~*)] (change of variable 6(t) to t) 
0 


LL Are z 
=- — zf E,[e~” ‘Je~* dt 


x 0 
1 Al 
nS if e7 ke At dt (by Theorem 12.4) 
a ajo 
1 1 1 1 1 2 
e = = — 1) Fok tb- Lk, 
: (er i i dria) a" 
Now take the inverse Laplace transform in «, i.e., observe I 
1 oO 
CURB TAY oe oe ~ at EB dt 
i T(kB + 1) f í 
yields 
E 20(t) < 1)¥c# f =M B). 12.60 
ae = — m = —ct š 
ole J = YW ri Moony NEE) 
so that 
A 
Eo| expy—“ (0) | = My(—A), 
i.e., 


ò ct? 


Le Pru < 80) <u+ au} = [rena du 
0 


which proves Theorem 12.5. E 


E. SOME CONSTRUCTIONS OF SPACE-TIME MARTINGALES AND APPLICATIONS TO DIFFUSION 
PROCESSES 


There is a simple way to derive a variety of martingales for diffusion processes. 
The result is a natural generalization of the Dynkin martingale of (12.13); the 
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method arose initially in the study of first passage problems through moving 
boundaries. 

Let {X(t),t > 0} be a time homogeneous strong Markov process on the 
interval I, with a strong infinitesimal generator A, whose domain is YA). 


Theorem 12.6. Let v(t,x) be a continuous differentiable function of t, i.e., 
v(t, x) = Ov(t, x)/Ot is jointly continuous in t and x and bounded for x in S, and 
suppose that for each fixed t > 0, the functions v(t, x) and v(t, x) are in BA). 
Then the process {Z(t), t > 0} defined by 


Z(t) = v(t, X(0) — | fc X(s)) + Av(s, X(s))} ds (12.61) 
0 


is a martingale with respect to the sigma-fields F, generated by the process 
values X(u),0O<u<t. 


Proof. It is convenient to split the proof into several parts. First, recall that if 
w € D(A), the process w(X(t)) — w(X(0)) — fo Aw(X(s)) ds is a martingale (See 
12.13). In particular, if we fix t,, then the process v(t,, X(t)) — v(t,, X(0)) 
— J Av(t,, X(s)) ds is a martingale. Hence, 


t2 
ef Av(t,, X(s)) aia, | = E[v(ty, X(t2)) — vti, XDI F]. (12.62) 
ti 

Next, a direct integration gives 


e| | ‘vs, X(t2)) asia, | = E[v(tz, X(t2)) — v(ty, X62) F]. (12.63) 


1 


Adding (12.62) and (12.63) gives 


ef ‘Lod, X(t2)) + Av(t;, X(s))] dsl, = E[v(t,, X(t2)) — olti, XDF]. 
: (12.64) 
Now we evaluate 


e| f” | * dvs, X(@) dé ds| a as | “e| | “Avis, x(a) dé s, ds. 


1 ti s 


(12.65) 


But for fixed s another application of the Dynkin martingale result conveys 
that 


t 
Bs, X(t)) = o(s, X(0)) — | Avs, X(¢)) d 
i 0 
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is a martingale. Hence, for t} < s < t, analogous to (12.62) 


“| | “Avs, X(¢)) aig, | = ELvs, X(t2)) — vfs, X) Fi], 


Ss 


and so (12.65) becomes, say, 


e| | “ods, X(t2)) — v,(s, X(s))] asia, =I). 


1 


Evaluating the integral at (12.65) by switching the order of integration 
gives 


fe a| |" [an XE) dé dsl, ž a| |" fanes, XE) ds aes, | 


1S ti Jti 


and since A is linear 


= e| | “A | “ous, XE) ds aaa, | 


z e| | "[Awlé, XE) — Avlty, XEN] aig, | 


=I, (12.66) 


| 
say. 
With these preliminaries at hand, we can now show that {Z(t), t > 0} is a 


martingale. We have to show that 
E(Z(t,) — Z(t)|FA,] = 0, tz >t, = 0, 


or, what is equivalent, that 


E[u(tz, X(t) — vti, X(t,)|A,] = e| | Toss, X(s)) + Av(s, X(s))] asim, |; 
' (12.67) 


But the right-hand side of the above expression is 


e| | “Tos, XG) + Arlt, XI asia, | 


2 e| | “Tends, X(9)) + fs, XE aig, | 


1 


+ r| | cad X(s)) — Av(t,, X(s))] asia, | 


= ELv(t,, X(t2)) — ot. XCF, — Ty + 1, [using (12.64)] 
= ELo(tz, X(tq)) — (ty, XC) F h 
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The last resulting since J, = I, as was shown in (12.66). The identity (12.67) is 
thereby established so that {Z(t),t > 0} is a martingale, and the proof of 
Theorem 12.6 is complete. Wi 


There are a number of special cases of this result worth mentioning. We 
begin with a simple example which illustrates that if v(t, x) 2(A), then Z(t) 
need not be a martingale. 


Example. Let {X(t),t > 0} be Brownian motion reflected at the origin, and 
set v(t, x) = x. Clearly, {X(t), t > 0} is not a martingale. The problem here is 
that v(t, x) € D(A) requires v,(t, 0) = 0 (see Problem 46). 


Corollary 12.1. If v(t,x) satisfies the conditions of Theorem 12.6, and v(t, x) 
satisfies v,(s, x) + Av(s,x) = 0, then v(t, X(t) is a martingale [cf. Chapter 7, 
(5.3)]. 


Corollary 12.2. If f€ BA), then v(t, x) = U, f(x) satisfies the conditions of 
Theorem 12.6 and so Z(t) is a martingale. 


More specifically, for v(t, x) = U, f(x), 


Z(t) = v(t, X(t) — [tes X(s)) + Av(s, X(s))] ds 
0 
= v(t, X(t) — {1G U,) 1X0) + Av(s, x6) |as 
0 


= u(t, X(t) — 2 Aut, X(s)) ds 
0 


is a martingale. 

In many examples, martingales can be constructed for functions not 
satisfying the conditions v(t, x), v,(t, x) ¢G(A) by using the result of Theorem 
12.6, and a truncation argument. We illustrate one such case now. 


Example. Let {X(t), t > 0} be a Bessel process on (0, 00) with parameter y, 
y > 4. Then 0 is an entrance boundary, while œ is natural. The infinitesimal 
generator A is given by Au = $u” + (y/x)u’, with domain 


P(A) = {ue C20, œ): u, Au > 0 as x > œ; x??u'(x) > 0 as x > 0}. 


Proceeding formally for the moment, let g(x, A) be the unique (apart from a 
positive multiple) increasing solution with lim, 9, x?’g(x,/4) = 0 of Ag = dg 
(A = 0), and set v(t, x) = e~ “g(x, A). Then v(t, x) = —Av(t, x), while Av = Av (by 
determination). Hence, v,(f,x) + Av(t,x)=0, and so by Theorem 12.6 
Y(t) = v(t, X(0) is a martingale. The only flaw in this argument is that g ¢ AA), 
since g is unbounded in x. To prove that Y(t) is a martingale, we use a 
iruncation argument. 
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First, we identify g. We have to solve the equation 


39" + (y/x)g' — Ag = 0. (12.68) 
The increasing solution of this equation is 
g(x) = g(x, A) = x% I-i ( 24x), (12.69) 


where I,(-) is the modified Bessel function of order 6. [See, for example, (6.15)]. 
By examining the form of g(x, 4) given in (12.69), we find that x*’g'(x, 4) > 0 as 
x—> 0. The truncation argument goes as follows. Define functions f,(x) such 
that f e C(0, 00) and f(x) =1, O< x <n. Set g,(x, 4) = g(x, A)f,(x). Then 
Jn E HA), and by Theorem 12.6, we conclude that for v(x, t) = e7 “"g,(x, A), 


vit, X(t) — [ore X(s)) + Avs, X(s))] ds 
0 


is a martingale. Let ø, be the first passage time of the process X(t) to n. Since 
v,(s, x) + Av(s, x) = 0 for 0 < x < n, an application of the optional stopping 
theorem (Theorem 8.1, Chapter 6) shows that 


tAn 


u(t A Onm X(t A Cp) -f Lv,(s, X(s)) + Av(s, X(s))] ds 
o 


= u(t A Om X(t A o,)) = eg (X(t A G,), A) 
=e MAW X(t A o,), A) 
is a martingale. It follows that 
| E,[u((t A oy), X(t A o,))] = ELECO, X(0))] = g(x, 4). (12.70) 


If we can justify the interchange of limit and expectation in (12.70), then we 
achieve 


E,[v(t, X(0)] = Ese gX), 4] = gŒ, A), (12.71) 


which implies that v(t, X(t) is a martingale. To see this, notice that since g(x, A) 
is monotone, g(X(t), A) is Markov. Hence, by (12.71), 


E[u(t + s, X(t + SIZ] = Elu(t + s, X(t +s) | X(9)] 
=e *E[e “9(X(t + s), 4)| X(s)] 
= e™ gX (s), 4) = v(s, X(5)). 


To justify the interchange in (12.70), using the notation I,,,<, as the 
indicator function of the set {w: c (w) < t}, we write 


E [elt A Ons X(t A o,))] a E [v(t A Ons X(t A On) Lien std 
+ ELO A Op X(t. A o,)ia,.>0] 
= E Eel, X Dhansa] + E, (v(t. X(Mio,> 4] 
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The second term converges to E,[v(t, X(t))], since the random variables I;,,,. 1 

increase to the unit function as n increases, so the monotone convergence 

theorem applies. It remains to show that the first term goes to zero as n > œ. 
But 


E [oO X(Ow)Ljon<ol] = Ele” “G(X (on), Aiens] 
< EIX (On), Mle, <9] = gln, NEM 6,<1] 
= g(n, A) Pr{o, < t|X(0) = x}. (12.72) 
From properties of the Bessel function, we know that g(n, 4) increases like 


n-7’ev?4"_ We have then to show that Pr{o, < t|X(0) = x} decreases much 
faster. Now 


Pr{o, < t|X(0) =x} = Pr} max X(u) > n|X(0) = x} 
O<u<st 


(12.73) 
(and introducing the notation Pr,) = Pr A max X(u) > nk. 


O<u<t 
In what follows, for ease of exposition we will assume that y = (r — 1)/2 for 
some integer r > 2. Then X(t) is the radial process of r-dimensional Brownian 
motion (Z(t), ...,Z,(t)), say. From (12.73), 


Prod max ,/Z7(u) + --- + Z2(u) > nh 


O<u<t 


= Prod max (Z7(u) + «++ + Z?(u)) = J 


O<u<t 


n n 
< Prog max max |Z,u)| >—=; <r Prod max |Z (u)| > 2) 
1<i<srO<u<t Jr O<u<t Jr 


< 2r Prof max Z,(u) > z} = 2r Prof Tny < t} 


1 faat 
O<u<t Jr 


where T, = inf{s > 0: Z,(s) > a}. But, using the result of Chapter 7, (3.4), this 
is just 


t 57 3/2 n2 
ae d 
a apd- z) 
2 2 


A n i ee ee eee z 
= 2r ap- En 5} [—" a exp Ort + ls (t+1 of ds 


n? 


' 3/2 VEF ae 
rt +1 aT A Er exp Ont + z d 


o 2 
<ar/t +1 TE u (12.74) 


< 2r epf- 
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since 


œ -3/2 À 2 
ns n 
| te | a} ds =1. 

o/r(t + 1) ,/2n 2r(t + 1)s 
Hence, we can conclude from (12.73) and the inequality (12.74) that 
g(n, A) Pr,{o, < t}—>0 as n—> œ. Hence v(t, X(t) = e “g(X(t), A) is a mar- 
tingale whenever y = (r — 1)/2. The more general cases } < y can be proved 
with more ardvous estimates of Pr,{o, < t} for Bessel processes. 


13: The Spectral Representation of the Transition Density 
for a Diffusion 


Consider at first a regular diffusion on a finite closed interval I = [I, r], o7(x) 
continuous and positive over I, with l,r separately exit and/or reflecting 
boundaries. With f(x) bounded and continuous on (i, r) the function 


u(t, x) = E,[ f(X()] 
satisfies the backward differential equation [cf. (5.3)] 


ou 1 1 d/1 d ; 
a 2 m(x) dx (4 i) ae uap 


with initial condition u(0, x) = f(x) and boundary condition 


u(t, ) = 0 if l is an exit boundary, 


on . (13.2) 
-u(t,) =0 if l is reflecting, 


and similarly at the right boundary r. 
By the method of separation of variables we attempt a solution of (13.1) of 
the form 


u(t, x) = c(t)(x). (13.3) 


Substituting into (13.1) and rearranging gives 


c(t) _ Le) 
c(t) olx) 
The left-hand side depends only on the variable t, while the right-hand side 


involves only the variable x. This is consistent only if (13.4) is a constant, 
yielding accordingly 


(13.4) 


c(t) = —Ac(t) (13.5) 
and 


Le(x) = —Ap(x), (13.6) 
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where À is a constant and (x) obeys the proper boundary conditions, e.g., 
(D) = g(r) = 0 if l and r are exit points. 

For many specifications of boundary conditions, especially where [/, r] is 
bounded, admissible solutions of (13.6) obeying the requisite boundary con- 
ditions can be realized only for a countable sequence {4,}ọ, the “spectrum” 
(eigenvalues) of the operator L. In other words, there exists a unique infinite 
sequence 


0519 <1, <0 Ann An t 0 (13.7a) 
and associated eigenfunctions 
PolX), P(x), --+5 Q(X), --- (13.7b) 
such that 
Le@y(X) = — An Pnl(X) (13.8) 


and then from (13.5), we have c,(t) = c,e~ ’", where c, is a free constant. Thus, 
this procedure produces the solution c,e~ *“@,(x) of (13.1). 

Integration by parts shows that {¢,(x)} constitutes an orthogonal sequence 
of functions with respect to the density m(x), i.e., 


| mG ()0{x) dx = 0, i#j. (13.9) 


In fact, 


sa | ‘(x92 9420) dx = | “ n(x) (CL (x) dx 
I 


1" dfi d 
=-]| o) —| — — o; dx. 
2 | oooi (i dx o) 2 
Integration by parts twice, and use of the boundary conditions yields 
= | m(x)Lo(x)p{x) dx = — 4; | m(x)p(x)p{x) dx. 
I l 
If i # j, then A; # A; by (13.7a), and relation (13.9) is valid. 


As the differential equation is linear, superposition of the solutions 
U,(x, t) = cpe 7 *"@,(x) produces a series solution 


u(t, x) = È cpe px), (13.10) 
n=0 
where the {c,} are to be determined to guarantee the initial condition 


U(0, x) = f(x) at 1 = 0. On account of the orthogonality of {@,} (13.9) and its 
“completeness” endowment, we may expect 


Ca = ra [oeo dy, 
I 


where 1/7, = [7 p2(y)m(y) dy. 
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For the specification 


1, a<y<b 
0, elsewhere 


f(y) = | 


the above constructions suggest that 


b 


b œ 
U(t, x) = | P(t, x, y)dy = È echimga | 


a a 


Pnly)m(y) iy) : 


Dividing by (b — a) and letting the interval (a, b) shrink to y we obtain formally 
the spectral representation 


0 


P(t, x, y) = m(y) X, e= Pax) Pa) Tn - (13.11) 


n= 


Such expansions are legitimate in many cases. We illustrate some classical 
examples where ¢,(x) are identifiable. The appropriate boundary conditions 
entail: 


(i) Vanishing at an exit boundary. 
(ii) Derivative vanishing at a reflecting boundary. 
(iii) d/dS = 0 at an entrance boundary. 


Other kinds of lateral and/or boundary conditions relate to boundedness or 
integrability requirements. 

It is useful to highlight the spectral representation for a variety of diffusion 
processes of classical types. 


A. ORNSTEIN-UHLENBECK PROCESS 


ux) = -x, (x)= 1. 
It is known (cf. Section 5) that the transition density is explicitly 


(xe™' 


1 -y 
p(t, x, y) = A ae ap] - oi] 


_ e” g tye a 2xye™! 
aa e |. e a eA 


(13.12a) 


The equation (13.1) can be written as 


ĉu = 1.2 ô eo? Ou = 1 Ou 2x du 
a2. ôx ax) 2\ Ax? ax)" 
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In the case at hand the eigenfunctions for (13.6) satisfy 


2 
L¢(x) = E uae x£) = —1¢p(x), -0 <x< %0, 


subject to the integrability conditions f? e`” [ọ(x)]? dx < œ. These are the 
classical Hermite polynomials 
2 d" — x2. 
H,(x) = (—1)f’e* —,(e*) 
dx 
with associated eigenvalues 4, =n, respectively, and then we obtain the 
spectral representation 


0 


plt, x, y) = e7” X, eH, (x) Hy), (13.12b) 


n=0 


where x, 1 = (./n2"n!). By virtue of a classical formulat for the sum of the 
series, we find that (13:126) reduces to (13.12a). 


B. RADIAL O.U. (ORNSTEIN-UHLENBECK) PROCESS IN W-DIMENSIONS 
N-1 
u(x) = S — x), o(x)=1, O<x<o. (13.13) 
xX 


These constitute the infinitesimal parameters that arise when imposing on an 
N-dimensional Brownian motion a restoring force directly proportional to its 
distance from the origin. 

The corresponding backward differential equation is 


ôu 1 1 Z (e2) ôu 18u (S É 
x — -x 


ðt 2r lox ax) “a las 2x ax’ 
0<x< o. 


The infinite point is a natural boundary while the origin is an entrance 
boundary for N > 1 with the associated condition 


Subject to this constraint, the solutions of 


ldo /N-1 dp 
2 dx? +( 2x oa) oN 


tA. Frdélyi, edu “Higher Transcendental Functions“ Vol I, p. 194. MeCGraw-Till, New 
York, 1953. : 
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are Laguerre polynomials. (The Laguerre polynomials L‘”(é) with parameter «, 
usually normalized by the conditions ae > 0 and 


pad oes E _[n+a 
|, HAPE T zë- : ) 
satisfy the differential equation čL®"(&) + (a +1— ELP (E) + nL?(S = 0.) 
The Laguerre system L®(é) comprise the unique orthogonal polynomials with 
respect to the weight function w(ë) = ée~°/['(a + 1) for €>0; w(ë) = 0 for 
ë <0. When « = —5, LỌ 1/?)(x?) coincides with the Hermite polynomials, viz., 
Hay x) = (= 1)"27™ mn! LE (x7). 

The relevant spectral representation for (13.13) is 


plt, x, y) = yY Tte?” 2 e LOP LA (97) Wn (13.14) 


where 


The series (13.14) can be summed to give 


E a Lal —(x? + y”)e7 7” ree 2xye~' 
Pt, x, y) = 2y" te ae exp = eye Y LATE J 


1—e e 
(13.14a) 


where I, is the modified Bessel function of order «.t 


C. POPULATION GROWTH MODEL 
u(x) = bx + c, a7(x) = 2ax, 0<x< 0. 


a, b, c are constants, a > 0. A straightforward but arduous separation of vari- 
ables provided b < 0 and c > 0 leads to 


(b/a tye L raf —OX\ sw f — BY T(n + 1) 
ros Brean SES) eran) 
(13.15) 


where « = (c/a) — 1, and so 


PEA YE (\b|/a)*** y%e ae ae {aie = 


T(æ + 1)(1 1- e" 
3 (Bes) o ). 
a moe a (1 be) 


+ See Erdélyi, “Higher Transcendental Functions,” Vol. I, p. 189. 
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T=(~1,1), we) = 318 + DA —x)-@+ DH 07) = (1 — x’) 


For « and f positive, the boundaries —1 and 1 are entrance boundaries. The 
corresponding differential equation (13.1) becomes 


aes l 3 [a-oa sonal, 


ôt 2(1—x)(1 + xf ôx 


The boundary conditions for entrance boundaries (see Theorem 12.2, page 
306) are equivalent to 


ð 
1 "a =0, (1 + xf = 0. (13.16) 
x=17 x=-1t 
The solution of 
1 d 


1+a 1464| _ = 
iE |t-9 (1 + x) | - Ap(x) 


under the boundary conditions (13.16) singles out the Jacobi orthogonal 
polynomials. 

The spectral expansion becomes 
a -= yl + yT +p +2 < 


—n(nt+at+ B+ 1)t/2 p P 
gat B+ T(x + 1)T(B + 1) Ze n(X) nO )Tns 
(13.17) 


where P,(x) = P(x) are the Jacobi polynomials normalized to have value 1 
at x = 1 and 

_ T(B + 1) Tat+a+ DP n+at+ 64+ YQn+a+ B+ 1) 
" Tt + IM(« + B + 1) Tin+ 1I0(n+ B+ 1a+ 6+ 1) 


p(t, x, y) = 


T 


E. RADIAL MOTION OF B.M. IN N-DIMENSION FOR A PARTICLE STARTED INSIDE THE UNIT SPHERE 
AND TERMINATED (FIXED) AT THE INSTANT IT ATTAINS THE SURFACE OF THE UNIT SPHERE 
The process Y(t) is a diffusion on 0 < x < 1 for which the point 1 is a trap 

state. Its backward diffusion equation is 


ôu 1 1 ôf y_, ðu 
a 2y D 
with the boundary condition u = Oat y = 1 at the trap boundary. The reflecting 
barrier (or entrance boundary) condition at x = 0 is du/dS|,,-9 = 0. 
The diffusion coefficients are 
N-1 


POs. M=}, 0<x<l, pose 
x 2 


F See Erdélyi, “ Higher Transcendental Functions," Vol, I, p. 169. 
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(compare to the Bessel Process). The equation of (13.1) in the present context is 


1 1 df -dọ 

Saa — j= -A 13.18 

ANKE Jx ( dx (x) ( ) 
coupled to the boundary conditions 

-149 
Net = 1) =0. 13.1 

aa ee 0, (1) =0 (13.19) 
The eigenfunctions of (13.18)—(13.19) can be identified to be 

Pa) = x70 WI w ayl An) (13.20) 


where Jan is the sequence of positive zeros of the Bessel function Jy — )/2(x). 
The spectral expansion of the transition density is 


p(t, x, y) = YY! Ye "p(X OV) Tn; (13.21) 
n=0 


where 2,1 = f} y2(x)x%~1 dx. 


F. WRIGHT-FISHER GENE FREQUENCY DIFFUSION WITHOUT MUTATION (c.f. EXAMPLE F, 
SECTION 2) : 


u(x) = 0, a?(x) = 2yx(1 — x), = (0, 1]. (13.22) 
The backward differential equation corresponding to (13.22) is 
ôu 2 
— = yx(1 — x) 5 13.23 
a = — SS (13.23) 


with boundary conditions u(t, 0) = u(t, 1) =0. The resulting eigenvalue 
differential equation problem (13.6) is 
d? 

ea sey Py (13.24) 


dx? 


subject to (0) = (1) = 0. Reference to texts on classical special functions 
shows that for n => 1, 
A, = yn(n + 1) 
and 
Ax) = x(1 — HPEY — 2x), (13.25) 
where PŒ: P(E) are the Jacobi orthogonal polynomials of parameters « = 1, 
B = 1 normalized so that P{':')(1) = 1 [see after (13.17)]. 


These polynomials are orthogonal with respect to the weight function 
6x(1 — x) over the interval [0, 1]. The expansion of the transition density is 


p(t, x, y) = m= E pn- Pn- en + 1)(2n +1). (13.26) 
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Observe that p(t, x, y) goes to zero at the rate e~ ?” as is anticipated because 


absorption at 0 and 1 are certain outcomes. The conditional limiting density of 
the process, given absorption has not occurred, is uniform: 


lim Pr{y < X(t) < y + dy|X(0) = x, X(t) # O or 1} = 1 dy. 


tT oo 


Continuous Spectrum 


There is not always available a series spectral representation of the transition 
density of a one dimensional diffusion. This situation occurs where the spectrum 
(set of eigenvalues of (13.6)) is not discrete but involves a continuous portion. 
The generalized spectral representation replaces the series by an integral of the 
form 


p(t, x, y) = m(y) F e olx, AJo, A) dpa). (13.27) 


There are cases involving both discrete spectra (eigenvalues) and continuous 
spectral sets. The more intricate models will not be presented here. However, 
some of the standard diffusion examples have only continuous spectra. 


G. BROWNIAN MOTION 


There is no discrete spectrum in this model. In fact, the spectral density 

expansion of the density conforming to (13.27) becomes 

eT XP /2t — 1 
/ 27 


p(t, x, y) = f e772 pixd e~ivh dA (13.28) 


the Fourier transform of a Gaussian type kernel, actually the inverse Fourier 
transform of the Gaussian kernel. In this model g(x, 4) = e** which satisfies 
the equation g(x, 2) = —A*@(x, A) for each real 2 where here the natural 
boundary conditions demand that g(x, 4) be bounded for all real x. 


H. REFLECTING BROWNIAN MOTION ON (0, œ) 


The spectral expansion of the transition density for x, y > 0 is 


= —(x—y)?/2t = (x+ y)?/2t 
p(t, X, y) ines [e e= + e ] 
/ 27 


t š 
= iE | g7% 2(cos xA)(cos yA) dì, (13.29) 
0 


the Fourier cosine transform of the Gaussian kernel. 
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a a a e e ee eee ee ae e a e an a aaa r aaa ae 


I. RADIAL BROWNIAN MOTION IN £" 

The backward equation is 
Geshe ee 
ôt 20x?" x dx’ 


The transition density has the form 


0<x< 0, y=(N — 1)/2. 


P(t, x, y) = m(y) lee 12 T(Ax)T (Ay)m(A) då 
0 


with 
270- 1/2) 
my) = = y”, 
ro + 9) 


where J; is the classical Bessel function of order 6. 


x 1/2-y 
27 T(x) =T + (5) Jy—1/2(x), (13.30) 


Some Diffusions with Spherical Harmonics 


In the remainder of this section we describe a geometrical construction which 
relates the transition probability (13.17) for the special case « = B = (N — 1)/2, 
N = 0, 1, 2, ... with a Brownian motion on the surface of the unit sphere Q 
in a Euclidean space of N + 2 dimensions. If €,,..., éy, 2 are Cartesian co- 
ordinates on the surface of the unit sphere Q(é7 + --- + €24, = 1) then spheri- 
cal coordinates 0,,..., 0y, $ are defined by 


€, = cos 4,, 
č, = sin 0, cos 03, 


éy = sin 0, sin 6,---sin Oy_, cos Oy, (pst) 
čy+1 = Sin 0,---sin Oy cos ¢, 
éy,2 = sin 0, ---sin Oy sin ¢, 
where 0 < 0; < a and 0 < ġ < 2x. The Laplace—Beltrami operator A (the 
analog of the Laplacian in N dimensions) referring to the unit sphere Q is 
given by 


1 _y Ô Ou 
Au = (sin 01) N 36, sin 6,)% Z| 


ð Ou 
: -26; PON. i N= DEN 
+ (sin 6,) “(sin 02) 30, (sin 02) a] + 


: ; adhe _, oO]. Ou 
+ (sin 6, +- sin @y_;)7 7(sin Ay)! Oy [sin Oy =| 


: : eu 
+ (sin 0, +++ sin Op)? ag? 
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The diffusion equation du/dt = Au has a unique fundamental solution on Q, 
denoted by p(t, & n), and this solution is the transition probability density of 
a stationary Markov process on Q (the term “density” means relative to the 
surface element dm on Q). We will assume that process &(t), called Brownian 
motion on Q, has continuous path functions. The transition density has the 
representation 


h(n) 
et Y SPOSPay, 
1=0 


ll 
Mes 


p(t, §& n) 


where (13.32) 
A, =nn+a+ B+ l)=n(n +N), 


h(n) = (n + N — 1)!2n + N)/(n! N) 


and $(&), 1 = 1,2,...,h(n) isa complete orthonormal set of surface harmonics of 
degree n, (e.g., see Erdélyit). By virtue of the addition theorem? for surface 
harmonics we have the alternative representation 


-an An) 
w(N) 


where 6, nò = čini +--+ čn+2y+2 is the cosine of the angle between the 
unit vectors & and n, and w(N) is the surface area of the unit sphere in N + 2 
dimensions. P,, is an appropriate Jacobi polynomial. 

Now we start a Brownian motion &(t) on the sphere with an initial distribu- 
tion which has the €, axis as an axis of symmetry, and which has no mass at the 
points č; = +1. The distribution of &(t) for any t > 0 is then also symmetric 
about the €, axis. The random variable X(t) = €,(t) which is the projection of 
&(t) on the č -axis is evidently a Markov process with continuous path functions 
and with state space —1 <x <1. To calculate the transition probability 
function of X(t) we introduce polar coordinates 


& ~ (94, 92,..-, On, Q), n ~ (04, 04,..-5 On, Q’) 
according to (13.31). The transition probability is clearly given by 
P(t, x,[—1, y]) = Pr{X(t) < y|X) = x} 


p(t, & n) = F e 


P,(«&, nò), (13.33) 


= | P(t, § 0) da, (13.34) 
-i<m<y 


where € is any fixed unit vector with č; = x. By symmetry considerations we 
take without loss of generality, 0; = cos”! x, 0, = 0; =--: = Oy = 0 so that 
KÉ, ny = cos 0, cos 0, + sin 6, sin 6; cos 6,. Using the area element 


din, = (sin 0,)%(sin 04)"~"---(sin 04) dO, d0y--- d0y dg’ (13.35) 


t Erdélyi, “Higher Transcendental Functions,” Vol H 
{See Erdélyi, p. 243, 
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and the representation (13.33) together with (13.35) we obtain 


P(t; x, [—1, y]) = | (sin 6;)¥ d6, Yen 48 SN _— hn) 


cos 0; <y n=0 o(N) 


In(91, 03), 


where 
SAGs, 84) = | P,(cos 6, cos 6, + sin 0, sin 6, cos 6,)(sin 04)" + d@,. 
0 
(This result has a slightly different form if N = 0.) Now by virtue of an identity of 
Gegenbauer} we have 


f(O1, 01) = P,(cos 0,)P,(cos 03) fin 6,)%~ + d0, 
and it follows that 
P(t, x, [-1, y]) = ew2, | x x, zZ(1 — 27) 8"? dz 
with p(t, x, z) as in (13.17), where « = B = (N — 1)/2and cy = f+, 1 — E&M’? dé. 
14: The Concept of Stochastic Differential Equations 


The next three sections introduce the important subject of stochastic differential 
equations and stochastic integrals. The modern approach in dealing with dif- 
fusion processes is in these terms. Several chapters would be required to 
elaborate rigorously the concepts and techniques thereof. We discuss some of 
the principal ideas and methodology in an informal manner in Section 15, 
highlighting a variety of examples. In Section 16 several important results are 
stated that emphasize the martingale property and the power of the Ito trans- 
formation formula. 

The motion of a particle suspended in a fluid is influenced by two principal 
forces. First, a nonrandom (deterministic) motion can be engendered by the 
nature of the underlying fluid flow or induced by some external force impressed 
on the system. Second, collisions and/or more general interaction relationships 
with other particles cause generally random movements which over short time 
durations are often well described by Brownian motion fluctuations. Thus, for 
a small duration from time t to t + At, the displacement of the particle (say 
along the x axis) is approximated by 


X(t + At) — X(t) x u(x,t) At + a(x, t) AB(t), (14.1) 


f Sec G. Szegö, “Orthogonal Polynomials,” p. 369. Colloquium Publications Series, Amer, 
Math. Soc., Providence, Rhode Island, 1959, 


14. THE CONCEPT OF STOCHASTIC DIFFERENTIAL EQUATIONS 341 


where X(t) = X(t, œ) = x is the location of the particle at time t. (we shall 
ordinarily suppress the variable œ identifying the sample path realization). Here 
u(x, t) is the instantaneous velocity of the fluid at time t and position x while 
AB(t) = B(t + At) — B(t)is the incremental change associated with the standard 
Brownian motion process B(t) and o?(x, t) > 0 measures the instantaneous 
variance associated with the collisions of the X(t) process. The first term 
reflects the movement caused by the deterministic forces while the second term 
expresses the random component of the motion. 

Since B(t) involves continuous sample paths, we can infer that X(t) is a 
continuous Markov process, provided u(x,t) and o(x,t) are appropriately 
continuous deterministic functions. In fact the Markov property is implied 
by the independent increments of Brownian motion. From (14.1) we infer, 
heuristically, that {X(t), t > 0} constitutes a diffusion process with infinitesimal 
drift coefficient u(x, t) and infinitesimal variance (diffusion coefficient) o7(x, t). 
Indeed, we have 


lim FLAX] 
ajo At 


= lim ae ELu(x, t) At + o(x, t) AB(t)] = u(x, t) (14.2) 
Aaro At 


(using the fact that E[AB] = 0) and 


lim x E[(AX)?] = lim x {var(AX) + (E[AX])?} 


1 1 
lim — var[o(x, t) AB] + lim — (E[AX])? 
^ro At aro At 


1 
lim zip 7 œ D At = 07%, t 14.3 
Ta (x, t) (x, t) (14.3) 


since 
(E[AX])? = O(At)? and var[AB] = At. 


It is tempting to continue the foregoing reasoning and concomitantly 
evaluate the limit of AX/At, but this random variable has variance of the order 
1/At. Notice that lim,,9 (1/At) var(AX) exists with a finite limit while 


(x) 

lim var 

At>0 At 

does not exist and consequently convergence of AX/At is, a fortiori, precluded. 
To assign meaning to the limit relation attendant to (14.1), that is, 


dX(t) 


dB 
Pi U(x, t) + a(x, t) T’ (14.4) 
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it is necessary to develop an extended version of stochastic differentials and 
integrals. Relation (14.4) is preferably written in the differential notation 


dX = u(x, t) dt + o(x, t) dB, (14.5) 


in which transformations on dX are more easily performed. 


WHITE NOISE 


The “process” dB(t)/dt = W(t) (recall from Chapter 7 that the Brownian 
paths, although almost certainly continuous, are nowhere differentiable) is 
commonly referred to as the white noise “process” for a number of reasons, 
several of which we now elaborate. 

Consider the correlation of the incremental displacements A, B(t) = 
B(t + h) — Bit) and A, B(s) = B(s + k) — B(s) corresponding to two non- 
overlapping epochs (t, t + h) and (s,s + k), where t + h < s and h, k > 0. 

The independent increment property of B(t) guarantees 


A, B(t) A, B(s) | _ 
1] 70 sze] =0, (14.6) 


Gale 


Thus, we might expect the Dirac delta function 


fea zo] = 0, t=s, 
Ej- = 
0, t £ S, 


while 


T (14.7) 
signifying that the continuous time stationary process dB/dt consists of a con- 
tinuum of random variables, each random variable corresponding to a parameter 
value of t, which are all uncorrelated (and even mutually independent) and whose 
sample paths in some sense occur as the derivative of the realizations of Brownian 
motion. Of course, Brownian motion paths are not differentiable. There is no 
simple construction to realize such a process dB/dt. The analog of dB/dt in 
discrete time is simply a sequence of independent random variables, each norm- 
ally distributed with zero means and unit variances. The spectral density of such 
a sequence (see Section 7, Chapter 9) is constant over the range [0, 27], signi- 
fying that all frequencies appear equally, and thus motivating the description 
“white noise.” Difficulties arise in establishing a well-defined version of a process 
whose values at all points of a continuous time parameter are independent. 
Modern mathematics has developed a framework for differentiating all con- 
ceivable functions, even those usually considered nondifferentiable. These 
developments are encompassed by the theory of Schwartz distributions. We 
shall not follow this tack although it is somewhat pertinent to the present 
context. 
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In continuous time, white noise is not a physical process but is an abstraction 
wherein all frequencies over the range (0, 00) appear equally. What is germane 
for applications is to replace W(t) by a smoother stationary Gaussian process 
with an essentially flat spectral density over a broad range of frequencies. For a 
Gaussian process Y(t) having covariance function 


R(t, s) = ELY(t)Y(s)] 


(to simplify the notation we have stipulated E[Y(t)] = 0) for which p(t, s) = 
€?R(t, s)/ðt ðs is continuous, the derivative process Y(t) = dY/dt is meaningful 
and constitutes a Gaussian process completely characterized by the covariance 
function 


@ R(t, s) 
p(t, s) = Saros 


In fact, expanding the expectation yields 


r| 2t +h) — Y(t) Y(s + k) — re] 


= ELY(t)Y(s)] (14.8) 


h k 


_ R(t+h,s+k)— R(t,s + k)— R(t + h, s) + R(t, s) 


7 (14.9) 


Letting h and k go to zero, the right-hand side converges to 67R(t, s)/dt Os. It 
follows that [Y(t + h) — Y(t)]/h converges in mean square, and we then define 
Y(t) as the limit. It suffices to have R(t, s) continuously differentiable on the 
diagonal t = s, since then Y(t) exists for all t. When the process Y(t) is chosen 
appropriately, then the derivative process Y(t) will approximate white noise. 

To compare with Brownian motion, recall that for it the covariance is 


R,(t, s) = min(t, s), (14.10) 
which is not differentiable along the diagonal (t = s). The formal derivative is 


_ OBR(t, s) _ r if t=s, 


(Compare with (14.7). 


Meake a o ea 


(14.11) 


The identification (14.11) compared to (14.7) suggests a consideration of the 
formal derivative dB/dt = W(t)asacontinuous time stationary Gaussian process 
with uncorrelated (and hence independent) points. 


STOCHASTIC DYNAMICAL SYSTEMS 
Extending (14.4), the solution of a differential equation of the form 


dX(O) 


gr 5S0 XO WO, (14.12) 
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where f is a real-valued (generally smooth) function, is called a continuous 
stochastic dynamical system. The random disturbing function W(t) is called a 
random forcing (driving, input) factor. The initial conditions in (14.12) may be 
fixed or random with known distribution function. Problems 6-11 deal with a 
number of discrete versions (difference equations) of (14.12). 

A wide range of physical and engineering models expressed by the dynamic 
relation (14.12) can be found in numerous electrical engineering texts con- 
cerned with stochastic control, filtering, extrapolation, and prediction studies. 

The analysis of (14.12) can be transferred to that of an equivalent integral 
equation, 


X(t) — X(0) = ic X(t), W(1)) dt, (14.13) 


provided one can appropriately define the limit involved in the integral. 

We shall mostly concentrate on the special case of (14.12) of the type (14.4). 
In order to secure solutions of (14.12), or more specifically (14.4), we shall 
adapt to the stochastic context the classical method of successive approxima- 
tions basic to solving differential equations. The simplest example of (14.12) has 
the right-hand side independent of W(t) and randomness entering only in the 
initial conditions. 

Another common physical model involves only a random nonhomogeneous 
component where (14.12) specifically takes the form 


X(t) = oo = f(X), ) + Wid). (14.14) 


A well-studied physical ae system of the sort (14.14) is the mass spring 


linear oscillator driven by white noise. The displacement position corresponds 
to a solution of the second-order differential equation 


X(t) + 2BX(t) + y? X(t) = = = W(t) (14.15) 


where f (resistance parameter) and y (inertial constant) are real constants. 
Introducing the vector . 
X(t) 


a= be 


then (14.15) can be cast in the vector form 


) X(t) = X(t), 


: 0 
X(t) = & 3p) + W(t) (14.16) 
with 


0 
W(t) = Ga. 
and (14.16) can be regarded as a vector example of (14.14). 
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It is frequently more relevant to replace the right-hand side of (14.15) by a 
more general stationary Gaussian process Y(t) with a known covariance 
function, something other than white noise. For Y(t) as prescribed, the solution 
X(t) is generally no longer a Markov process. 


The Ornstein-Uhlenbeck Process as a Solution of a Dynamical Stochastic 
Differential Equation 
We investigate the differential equation 
dX(t 
s ) = —fX(t)+ W(t) X0)=0 (B > 0), (14.17) 


where the input process {W(t), t > 0} is Gaussian white noise, W(t) = dB(t)/dt. 
This equation can be solved by direct methods. We rewrite (14.17), employing 
the obvious integrating factor, yielding 


n B(t) 
dt ` 


Integrating both sides and invoking X(0) = 0 produces 


d i _ 
i [e*X(t)] = e 


t 
X(t) = | geo ie (14.18) 
0 dt 


Next we integrate by parts the right-hand side to get 


e" X(t) = e" B(t) — B feso dt 
o 


X(t) = BE) — P feno dt. (14.19) 
o 


The representation of X(t) in (14.19) as a linear operator on B(t) implies that 
X(t) is a Gaussian process (see Section 8 of Chapter 9). It remains to ascertain 
the mean and covariance of X(t). Clearly, 


E[X(t)] = E[B(t)] — B fezeron dt = 0. 
o 


Next we formally compute E[X(t)X(s)], which is done most expeditiously 
proceeding from (14.18). We obtain for t < s 


eM 9ETX(t)X(s)] = | l PE METH EWE] dé dn 
Ovo 


= [en dë (since E[W(é)W(n)] dé dn = 5(E — n) dé dn) 


0 
e?! =l 


SSc 
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and then 
= ew 2b 
X(t?) = ——. 
EIXO) = 5 
With a general initial condition X(0) = x, we find that the solution {X(¢t), 
t > 0} to (14.17) is a Gaussian process with moments 
1 — e7?’ 
E[X(t)] = xe” and  var[X(] = aR 
Comparing this with Example C of Section 2 of this chapter we recognize 
X(t) as the Ornstein—Uhlenbeck process. 


(14.20) 


STOCHASTIC DIFFERENTIAL EQUATIONS 


The modeling of physical and biological systems by stochastic differential 
equations of the form 


dX(t) = w( X(t), t) dt + o( X(t), t) dB(t) (14.21) 


in terms of an input (driving) white noise process has two principal merits. 
First, it is usually amenable to mathematical analysis. An important fact is that 
the solution of (14.21) as implicit in the discussion of (14.4), determines a 
diffusion process with infinitesimal drift = u(x, t) and infinitesimal variance = 
o*(x, t). Second, although the white noise process is a mathematical abstrac- 
tion, it well approximates a variety of noise or other background and environ- 
mental random input processes that occur naturally in physical and biological 
contexts. The resistance noise in certain electronic systems often yields a spectral 
density (see Chapter 9) for the voltage source which is flat for a wide range of 
frequencies. 

The solution of the stochastic differential (14.21) is in terms of a stochastic 
integral {o(X(t), t) dB(t). There are two prominent versions of stochastic 
integrals: one called the Ito integral (abbreviated the I-integral) and a second 
introduced by Stratonovich (called the S-integral). The construction and the 
most pertinent properties of the I-integral are set forth in Section 16. The concept 
and manipulations of the S-integral are reduced to calculations of related I- 
integrals. Apart from its value in solving stochastic differential equations, the 
Ito indefinite integral generates an important class of martingales as will be noted 
later (Section 16). However, the transformation properties of the Ito stochastic 
process are not concordant with the rules of ordinary calculus, which is perhaps 
slightly disconcerting. On the other hand, the solution of a stochastic differential 
equation in terms of the S-integral conforms more closely to the solution of the 
differential equation along each sample path induced by the random input 
process. The possibility of solving the stochastic differential equation pathwise 
is more satisfactory for many physical and biological systems and often more 
appropriate for the evaluation of data and the objectives of the model. The 
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S-integral generally differs from the I-integral by a corrective term. The precise 
relationships of the two integrals will be clarified later in this section. 

Another important element in favor of the S-integral is as follows. Many 
natural discrete time, discrete state physical models are often approximated by 
stochastic differential systems or diffusion processes. In many cases the limiting 
process leads to a diffusion identified with the S-integral interpretation of the 
associated differential equation, rather than the I-integral. These contrasts are 
amply illustrated in the next section. 


THE TRANSFORMATION LAW FOR THE ITO STOCHASTIC DIFFERENTIAL 


We now proceed to determine the differential dY(t) for Y(t) = f(X(0, t) 
(f smooth), where X(t) is a solution of 


dX(t) = u(X(t), t) dt + o(X(t), t) dB(t) (14.22) 


in the I sense. We carry out this task in a formal manner. The key to the analysis 
is the order relation 


[dB(t)]? x dt (14.23) 
whose rationale stems from the formula 
n—1 
lim ) (BCS?) — BEMI = t, (14.24) 
n>œ i=0 
provided 
CS <P 
and 


lim max [¢%, — 1] =0 

n>o Osi<n-1 
hold. Relation (14.24) is always valid in the mean square sense. We derived 
(14.24) in Chapter 7 (Section 7) when {t?} was the partition of [0, 1] delineated by 
the associated dyadic rationals. In that circumstance the convergence prevails 
for almost all sample path realizations as well as in the mean square sense. 

The total differential of f (X(t), t) = Y(t), where f is sufficiently smooth, is 

evaluated by applying Taylor’s expansion: 


dY(t) = df (X(t), t) = A(X), t) dX(t) + f(X(O, t) dt 
+ 3 fex(X(O, DAX (J? + fr, (X(O, t) dX dt 
+ fX O, DAN? + +-- (14.25) 


(abbreviating fy = Of/0x, fı = Of/ét, etc.). 
Substituting for dX from (14.22) invoking (14.23), and neglecting higher order 
terms we obtain 


AY (1) = CAAX(O, DUX D + SX D + bf X(O, DAX), N] di 
+ SAX (H), DEX (E), t) ABC). (14.26) 
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In integral form (14.26) is written 
Y() — Y(0) = [crx (1), DUX), ) + AXA, 1) 


+ ife XO, DAXO), Ðldt + | “f(x (t), tho( X(t), t) dB(t), 
o 


(14.27) 


referred to as the Ito stochastic transformation formula, where the final integral is 
interpreted as an J-integral. 

The extra contribution to dt comes from the factor associated with [dB(t)]? 
= dt owing to (14.23). A number of applications of (14.27) will be developed in 
Sections 15 and 16. 

From many physical perspectives, since white noise in reality does not 
exist, it might be reasonable to suppose [dB(t)]* negligible compare to dt, and 
this property is inherent to the Stratonovich integral interpretation of 
S-{ o(X(t), t) dB(t). (The notation S- signifies that the integral is evaluated in the 
Stratonovich sense. Without S- the Ito definition applies.) 


LIMITS OF INTEGRALS WITH RANDOM PROCESSES APPROXIMATING WHITE NOISE (WONG-ZAKAI 
LIMIT THEOREM) 

In practical terms the white noise process is intended to represent a stationary 
Gaussian process with a spectral density that is flat over a wide range of fre- 
quencies. If we take W(t) to be such a process, then there is no difficulty in 
treating (14.12) as an ordinary differential equation for each realization of the 
W(t) process, provided the sample paths are well behaved. Consider a sequence 
of Gaussian processes B,(t) having piecewise differentiable sample paths. For 
each n and specified sample path, the equation 


aX ,(t) 
dt 


aB,(t) 
dt ~ 


= WX,, t) + o(X,, t) X (0) = x, (14.28) 


can in principle be solved by standard means. Suppose now that B,, converges 
appropriately (as n > œ) to Brownian motion. If, independent of the approxi- 
mation of B, to B, the sequence X,,(t), 0 < t < T, also converges in a suitable 
sense to a process X(t),0 < t < T, then we can propose X(t) as a solution of 


ZO = U(X, t) + o(X, DW) for O<st<T, 


where W(t) is Gaussian white noise. 
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A natural choice is to take B,(t) as a polygonal approximation to B(t) 
defined for each sample path by 


E- t) 


B,(t) = B®) + [B ) — BEP] -a i, =e for tP <ta th, 


(14.29) 
where 


0O=M< tM <--- <t™=T, rea ae oa as n— oo. 
The integral of (14.28) gives 


X,(t) — X,(0) = fax a(t), T) dt + fox a(t), tT) dB,(1). (14.30) 
0 0 


Stipulating that X,(t) converges to X(t) appropriately, and provided u(x, t) 
is sufficiently smooth, the justification of the relation 


lim [vex a(t), T) dT = faao, T) dt (14.31) 
n> JO 0 


is indeed forthcoming. On the other hand, the ascertainment of the limit of 
f$ o(X,(1), t) dB,(t) poses some difficulties. We relate the limit of (14.30) 
in terms of appropriate Ito integrals. 

We proceed in a heuristic fashion. Define 


Wx, t) = | Wg, t) dě 
0 
(y(é, t) to be determined explicitly later.) The total differential then becomes 


d(x, t) = Yx, t) dt + W(x, t) dx 
= W(x, t) dt + yx, t) dx (14.32) 


[y(x, t) = W(x, t) by the fundamental theorem of calculus]. 
Evaluating at x = X,(t) and substituting from (14.28) gives 


dW(X,(t), t) = YAX (E), t) dt + (X(t), t) dX, (0) 
= (W(X,(0, t) + XX), DUX (t), £] dt 
+ (X,(0), t)o(X,(t), t) dB, (C). (14.33) 


Specifying y(é, t) = 1/a(€, t), the coefficient of dB, in (14.33) becomes identically 
1. Now, integration produces 


W(X, (t), t) — WX, (a), a) = f (X,(t), t) dt + B,(t) — B,(a) (14.34) 


where /'is the coefficient of dt in (14.33). 
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If X,(t) converges to X(t) and B,(t) to B(t) appropriately then passing to the 
limit formally in (14.34) leads to 


WX(), t) — W(X(a), a) = {iy (X(t), t) dt + Bit) — Bla) (14.35) 
for alla and t > a. If we assume that X(t) admits the representation 
X(t) = X(a) + | X (s), s) ds + | WX (s), s) dB(s), (14.36) 


signifying that X is a solution of the Ito stochastic differential equation 
dX = g(X(t), t) dt + h(X(t), t) dB, 
and apply the stochastic transformation formula (14.27) to Y(X(t), t), we obtain 


WX), t) — WX(a), a) = | W(X (s), s)g(X(s), 8) + W(X(5), s) 
+ Wd X(s), s)TA(X(s), s)]?} ds + i Wx(X(s), s)h(X(s), s) dB(s). (14.37) 


Comparing (14.37) and (14.35), since a < t is arbitrary we deduce that 
W*(X(s), s)h(X(s), s) = 1 


and 
WA(X(s), s)g(X(s), 8) + W(X(5), 8) + W xX (8), s)LA(X(s), s)]? = f(X(5), 5) 
i u(X(s), s) 
= W(X(s), s) + IOS (14.38) 
The first relation of (14.38) shows that 
1 
h(X(s), s) = Ee Or E o(X(s), s$), (14.39) 


the last equation resulting from the choice of w,(é, t) = y(é, t) = 1/o(€, t). The 
second equation of (14.38) reduces to 


W(X(s), 8)g(X(s), 8) + W x(X (8), s)[h(X(5), s)]? 


_ WX(5), s) 
o(X(s), s) 


= u(X(s), s)W(X(s), s). 


Therefore 


g(X(s), 8) = U(X (9), 8) — 3Wax(X(s), s)TACX(s), 9) ]° 
= U(X (s), s) + $0,(X(s), s)a(X(s), $) (14.40) 
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since 
a. _ _ _93(6, 8) 
WE, s) = a(é, s) and WaxlE, s) = [o(é, s)]° t 
Thus, by interpreting formally the solution of 
dX(t) = (X(t), t) dt + o(X(t), t) dB(t) (14.41) 


as a limit of solutions of a sequence of equations of the type (14.28), we find 
that the solution agrees with the Ito solution of the modified stochastic dif- 
ferential equation 


dX(t) = [UX (6), t) + 40 (X(t), to(X(O, t)] dt + o(X(t), t) dB(t) (14.42) 


involving the correction term ł0,0 contributing to the infinitesimal drift co- 
efficient. The solution obtained in this manner is referred to as the Wong- 
Zakai solution of (14.41). The Wong-Zakai solution of (14.41) also coincides 
with the S-solution; that is, with the process X(t) satisfying 


X(t) — X (0) = fax (s), s) ds + s- f'acx (s), s) dB(s), 


where the stochastic integral is the Stratonovich integral, as symbolized by 
writing S-f. [See (14.50) which follows.] An appealing facet to the Wong-Zakai 
approach or the S-solution is that the transformation rule for differentials in 
(14.41) is now concordant with classical differential calculus. 

Observe that where o(x,t) is independent of x and therefore o, = O the 
S-solution and I-solution of (14.41) coincide. 

A special case similar to (14.28) concerns the limit of X,, satisfying 


aX, = 0(B,(t), t) dB,, 


where B,(t) is a sequence of smooth processes converging appropriately to B(t). 
Equivalently, we would like to evaluate 


n>o ya 


lim [eau T) dB,- (14.43) 


It suffices to assume that B,(t) has a continuous derivative in t. 
The computation of (14.43) is analogous to that of (14.30), leading to 


t 


t t 
lim |. t) dB,(t) = | o(B(t),t) dB(t) + 4 | o (B(T), T) dt (14.44) 
involving a corrective term in the second integral on the right as in (14.42). 
Since the integral on the left in (14.44) coincides with the Stratonovich integral, 
we have 


sf a(B(t), t) dB(t) = fomo t) dB(t) + 4 [aao T) dt 


i i 
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We summarize the analysis leading to.(14.42) by highlighting the relationship 
of the S- and I-solutions of the stochastic differential equation 


dX = u(X(t), t) dt + o( X(t), t) dB. (14.45) 


Provided the coefficients u and o are sufficiently smooth, the Ito solution of 
(14.45) is a diffusion process having infinitesimal coefficients u(x, t) and o7(x, t). 
The S-solution of (14.45) can be obtained as the I-solution of the modified 
differential equation 


dX = [u(X, © + 40,(X, to(X, t)] dt + o(X, t) dB. (14.46) 


In view of the facts connected with (14.45) and (14.46), the S-solution is again 
a diffusion having the same variance coefficient while the drift coefficient is 
altered to 


W(x, t) + 30,(x, t)o(x, t). 


The behavior of the two solutions can be markedly different. For much modeling 
of physical systems the S-solution interpretation appears to be more natural, 
while in biological contexts both perspectives can apply depending on the 
discrete approximation structure. 


SOME FORMALITIES ON STOCHASTIC INTEGRALS 


The solution of the stochastic differential equation (14.45) involves stochastic 
integrals of the form 


{ot w) dB(t, @) = [oc dB(t), (14.47) 


where œw denotes a sample path and 9(t, œw) = g(t) is a stochastic process 
with suitable smoothness properties, but B(t) is not of bounded variation, and 
therefore the integral cannot be constructed in any standard sense. Nevertheless, 
it is possible to define (14.47) by the limiting process 


Lim Sol LBCf? .) — BUP (14.48) 


noo, max; [1 1 ai? {70 i=0 
(li.m signifies limit in mean square). In the sum (14.48), it is vital for the Ito 
definition of the integral to take the value of g(t) over t < t < tf, at the 
smallest allowable t, namely, ¢{”. 

The function g(t, œ) is assumed to be measurable F, = o(B,, t < t) [that is, 
ọ(t, w) is determined completely by the realization of the process B(t) up to 
time t] and provided iè E[@(t)*] dt < œ holds (a natural growth restriction), it 
can be proved that (14.48) exists. A remarkable result is that the indefinite integral 
X(t) = X(t, w) = fo Q(t) dB(1) determines a martingale adapted to the a-fields 
(Fih 
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Asymmetrical style integral has been proposed by Stratonovich in which the 
approximating sum is put in the symmetric form 
n-1 


È aloe?) + of? )ITBE? 1) — BEM]. (14.49) 


i= 


Under more restrictive conditions on g(t, œ) than required in (14.48), the 
approximating sums of (14.49) converge to the Stratonovich integral 


S- Í (1) dB(2). (14.50) 


The indefinite integral based on (14.50) no longer generates a martingale but 
inherits more appealing transformation properties. The main one is that if 
X(t) solves 


dX(t) = u(X(t), t) dt + o(X(t), t) dB) 


entailing the S-integral, then Y(t) = f (X(t), t) satisfies the standard differential 
relation i 


dY(t) = f(X(0, t) dt + f(X(O, t) dX(t), (14.51) 


where all differentials and integrals are now interpreted in the S sense. 

It is illuminating to compare the evaluation of the I-integral and the S- 
integral for the function y(t) = B(t). 

Observe for a partition {t{"} of [0, t] 


n-1 


Jn = D HBG) + BOPIB) — BEM] 
i=0 
n—1 
= 3 > (BG? DP — [BEM]} 
i=0 
= HBO]? — [BO)]?} = BOP (14.52) 


independent of the choice of {t}. 
We wish next to evaluate 


n-1 
lim I, = lim BPB!) — BEP). 
n> 0o n>o i=0 
To this end, we note the elementary identity 
n-1 
2, — > BER) — BEP = 21. (14.53) 
i=0 


We pointed out earlier, see (14.24), that 


"S rect.) — BUMP +t as noo (14.54) 
i=0 ` 
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(cf. Chapter 7). The facts of (14.52) and (14.54) together clearly establish by 
means of (14.53) that i 


I,  convergesto  4{B(t)]? — żt. 
More generally we have, reminiscent of (14.44), 


s-| oa, t) dB(t) = | WBO), t) dB(t) + feo, t)dt, (14.55) 
10) 


0 0 


which we have just verified for y(t, œ) = B(t). 


Example. The Growth Equation. We shall solve the equation 
dX = X dB (14.56) 


in the two senses of the S-integral and the J-integral. To this end we observe in 
accordance with the discussion of (14.46) that the S-solution of (14.56) is 
acquired as the Ito solution of 


dX =4X dt + X dB. (14.57) 


In order to solve this equation we appeal to the transformation formula (14.27). 
Consider 


fx i) = e. (14.58) 


By virtue of (14.26) and the trivial statement dB = dB we find for this f that 
f (BOO, t) = X(t) satisfies 


dX(t) = (fi + 4fxx) dt + fy dB(t) = X(t) dB(t), (14.59) 


which is precisely (14.56). Because the solution of (14.56) [with initial condition 
X(0) = 1] is unique (Section 16), we can conclude that 


X(t) o eBO- 1/2 


is the required solution when interpreted as an Ito stochastic differential equa- 
tion. 
On the other hand 


Y(t) = P” = g(B(t)) (14.60) 
satisfies (by (14.26)) the Ito equation 
dY(t) = $Y(t) dt + Y(t) dB(t) 
and also (compare to (14.57)) the S-equation 


dY(t) = Y(t) dB(t) (S). (14.61) 
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The meaning of the S-differential equation in (14.61) is found in its integrated 
form 


Y(t) — Y(0) = sfo dB(t). 
0 


Another approach to (14.56) proceeding by means of ordinary calculus yields 


dY 
oo 
whose solution pathwise is 
Y(t) 
log —~ =B 
8 Yo) (0) 


or 
Y(t) = e8® — (for Y) = 1), 
which agrees with the S-solution of (14.56). 


QUALITATIVE SUMMARY AND DISCUSSION OF DISCRETE AND CONTINUOUS STOCHASTIC 
MODELING 


Many phenomena in control schemes, biology, economics, engineering systems, 
physics, and other areas can be modeled by differential equations with stochastic 
perturbation terms. (Section 15 elaborates a number of concrete examples.) 
The stochastic differential equation 


dX(t) _ dB(t) 
y IZO D + UXO, 0 
dB , . 
($ = W(t), the white noise process) (14.62) 


or in differential notation 
dX(t) = (X(t), t) dt + W(X(t), t) dB, (14.63) 


can be interpreted in a number of ways which are not all equivalent. 

A difficulty attendant to (14.62) already underscored in our earlier discussion 
is that the white noise process dB(t)/dt = W(t) is not well defined. Three 
approaches have been principally set forth in seeking to resolve stochastic 
differential equations. 


(1) Constructing a refined series of discretized versions of (14.63) and 
proposing the limit process where well defined to represent the solution of 
(14.63). There can be inconsistencies between diverse discretizations which 
lead to different limit processes, The interpretation may then depend on the 
relevance of a specific approximation procedure and the solution is accordingly 
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model dependent. This consideration can be germane to the implications and 
interpretations of the analyses. 

(II) Method I suggests approximating the differential equations by discrete 
time systems, in this way circumventing the ambiguity in the meaning of W(t). 
An alternative approach is to approximate W(t) directly, replacing dB/dt = W(t) 
by a sequence of processes W,(t) converging to W(t) in some sense where all W, 
are sufficiently smooth, permitting the solution of 


dX ,(t) 


q PaO: + WED), OWA) (14.64) 


along each sample path by the standard techniques of differential equations. 
When the processes X,,(t) converge [say to X(t)], then X(t) can be referred to as 
the solution of (14.63). Unfortunately in some cases, depending on the co- 
efficients ọ(t, x) and y(t, x), the limit process X(t) could again depend on the 
nature of the approximation W(t). 

(ID) A direct mathematical approach converts the differential equation 
(14.63) into the integral equation 


X(t) = X(0) + [oc (1), t) dt + [vex (1), t) dB(r), (14.65) 


where the last integral is interpreted in the Ito sense. Subject to reasonable 
growth restrictions and smoothness conditions (see Section 16) it can be 
established that (14.65) admits a unique solution on a prescribed time segment 
0 < t < T where the initial condition X(0) is specified. The solution of (14.63), 
or equivalently (14.65), obtained by these means is called the Ito solution. 

(IV) Ifthe integral in (14.65) is taken in the Stratonovich sense the solution 
is achieved by incorporating into (14.63) an additional term in the dt part and 
then ascertaining its Ito solution. The modified equation often provides a 
process with behavior significantly different than the Ito solution of the original 
equation that lacks the corrective term. 

A great advantage in the use of continuous stochastic differential equations 
versus discrete models in describing certain physical, engineering, biological, 
or economic processes is that explicit answers are frequently accessible in the 
continuous formulations. The dependence and sensitivity of the process on the 
parameters are therefore more easily discernible and interpretable. The process 
realizations (or expectation, variance, or distributional quantities) for the 
corresponding discrete models rarely admit explicit representations and so 
their qualitative discussion is more formidable. On the other hand, for compu- 
tational, numerical or simulation objectives, the benefits of a good discrete 
formulation are clear. 

We now discuss each of these approaches more specifically. The technique 
extends the methods of standard differential equations to a framework entailing 
random coefficients and perturbation terms. 
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Consider first some aspects of approach I. 


(i) Aclassical numerical technique used in dealing with ordinary differential 
equations is to construct the Cauchy polygonal lines. More specifically, consider 
a sequence of partitions IT, = {t{”}, 

O= P< P<: <M=T (14.66) 
with maximal mesh size 
6, = max (tf), — tP). 
O<si<n-1 
In the case of (14.63) Khasminski proposed to define the discrete approximation 
at the grid points recursively by the prescription 
ENR) = XP, 
= XP + XP, dE — P) 
+ WXP, EBEE) — BEP), k = 0,1,2, ...,n — 1, 
(14.67) 
with Xf? = X™(0) fixed and X (t) extended “linearly” between the partition 
points in the manner 
XE) = XOP) + XP, HME — P) 
+ W(X, MEBO) — BEEP], tP < t< th. (14.68) 
This is not a linear extension since the adjunct B(t) — B(t™) is obviously non- 
linear in t over th? < t < t.. 

Subject to suitable smoothness requirements of @(x, t) and W(x, t) it can 

be proved that as 6, — 0, then 
X(t) converges appropriately to the Ito solution X(t) of (14.63). (14.69) 


(ii) A natural smoothening of the white noise process is to modify the 
Brownian trajectories by polygonal approximations at the sequence of grids 
(14.66). More specifically, we define the polygonal sample path 


Bote 1) — BOY”) 


B,(t) = B(t) + o m tP), t <t<t™,, (14.70) 
tk+1 — Ék 


or 
dB,(t) _ BEE) — BEP) 
ae, — 


W(t) = on t < t< tP.. (14.71) 


The dynamic equation analog of (14.70) with W,(t) replacing dB/dt is 


dX(t) 

dt 
which is well defined except at the discrete set {t{"} between which W,(t) = 
W,(1, œ) is a random piecewise constant trajectory. Subject to standard 
regularity stipulations on p(x, 1) and w(x, t) and treating (14.72) as an ordinary 


= P(X (1), t) + WX), OWA; (14.72) 
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differential equation for each sample realization, we solve this equation uniquely 
producing the process X,(t) = X,(t, œ). In the presence of further smoothness 
restrictions on o and y, the Wong-—Zakai construction underlying (14.72) 
allows the existence of a limit process X(t) such that lim,..,, E[{X,(t) — X()}7] 
= 0, and the probability limit 


BEG = lim X,(t) uniformly 0 < t < r}= 1 
n> o 
holds along an appropriate subsequence. 

The above limit process X(t) coincides with the Stratonovich solution of 
(14.63). Generally but not always a limiting process associated with a smoothened 
process (where one shifts from white noise to a smooth Gaussian process) leads 
to a Stratonovich interpretation after evaluations through ordinary integrals. 

The Stratonovich integral and the Stratonovich differential satisfy most of 
the conventional rules of the integral and differential calculus with respect to 
transformation formula and chain rule, the identity attendant to integration by 
parts, etc. This affords easy manipulation of the Stratonovich integral using 
familiar operations. A drawback of the Stratonovich integral contrasted to that 
of the Ito construct is that we lose the martingale property for the indefinite 
integral process. 

Computations of expectations and moments are much easier with the Ito 
integral than with the Stratonovich integral. Generally, most theoretical 
work is more conveniently done in the Ito framework, exploiting heavily the 
martingale endowments of the Ito integral. Also, the Ito integral is defined for a 
significantly wider class of functions than that of the Stratonovich integral. 
In addition, the restrictions on the coefficients of the Stratonovich stochastic 
differential equation compared to the corresponding Ito stochastic differential 
equation are much more severe. In many natural cases (e.g., certain nonlinear 
filtering theory) a solution exists in the Ito setting but not for the associated 
Stratonovich stochastic differential equation. 

We have described in (14.45) and (14.46) the transformations from Stratono- 
vich to Ito for the stochastic integral and the conversions between the associated 
stochastic differential equations. 


15: Some Stochastic Differential Equation Models 


We shall describe two classes of examples, the first based on population ecological 
and genetic systems, the second involving economic structures and examples of 
stochastic control. 


A. POPULATION GROWTH IN A RANDOM ENVIRONMENT 


There is considerable interest in stochastic analogs of classical difference and 
differential equations describing phenomena in theoretical ecology and 
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population genetics. In population genetics a number of recent studies have 
focused on models involving selection coefficients which vary systematically or 
randomly in time (cf. Case (g) of Section 2) in conjunction with random sampling 
effects that reflecting small population size fluctuations. Most of the ecological 
models incorporate randomness into their formulation by starting with the 
population growth model 


dN dB(t 

TIND +N WO, wo =O, (15.1) 
where N(t) is the population size at time t and W(t) is the Gaussian white noise 
process. The simplest concrete case considers the “exponential” growth equation 


dN 

—— = (ONG), (15.2) 
dt 

where r(t) represents an instantaneous rate of growth at time t. To account for 
random environmental effects, it is stipulated that 


r(t) = a+ yW(t), 


with W(t) as in (15.1) and « and y constants. 
The stochastic differential equation (15.2) becomes 


dN(t) _ dB(t) 
a Nofa +y Ai 

(15.3) 
or in differential symbols, 


dN(t) = N(t)(a dt + y dB(t)) = aN(t) dt + yN(t) dB(t). 


The solution of (15.3) in terms of the I-integral produces a diffusion process on 
(0, œ) characterized by the drift and variance coefficients 


W(x) =ax and o%(x)=y?x? for 0<x <0, (15.4) 


respectively, as can be seen formally from (14.21). 

Recall from Example D, Section 2, that the process defined by Y(t) = 
et!*+ 7B is geometric Brownian motion, i.e., the diffusion process on I = (0, 0) 
with infinitesimal parameters u(y) = (u + 307)y and o?(y) = o”y*. Comparing 
this with (15.4) we see that the solution to dN = aN dt + yN dB in the Ito 
sense is the geometric Brownian motion N(t) = exp[(a — 4y”)t + yB(t)]. 

In another approach, because N(t) factors on the right, we can solve along 
cach sample path by writing (15.3) in the form 


. =q dt + y dB(t) 
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and then integrating both sides to get 


NO _ 
log NO) = at + yB(t) 
or 
N(t) = N(O)e”+ 720, (15.5) 


This approach leads to a different geometric Brownian motion. The infinitesimal 
mean and variance, respectively, are computed as above to be 


u(x) = (a + 4y?)x and o7(x) = yx,  O<x< œ. (15.6) 


In comparing (15.6) with (15.4) notice that the drift term entails a contribution 
from the white noise process. In fact, the coefficients are exactly those obtained 
by interpreting (15.3) in the Stratonovich sense. 

The sample path behavior of the processes with diffusion coefficients (15.4) 
and (15.6) can differ significantly. In particular, if 0 < « < 4y’, then following 
(15.4), N(t) > 0 as t > œ, while following (15.6), N(t) > œ. The relevance of 
the method of solution of (15.3) is paramount in the interpretation. 


The Stochastic Logistic Equation 


One form of a stochastic logistic equation is 


ato = NOKA — N(Q)] where k(t) =a+ yW(th=aty 2. 


(15.7) 


Here we interpret « + yW(t) as a stochastic analog of the carrying capacity 
of an ecological habitat, usually considered constant. 
Another version sets 


dN 
a ON- B(t)N? 


where f(t) measures the individual effects on reproduction by survival of others. 
For r(t) = k(t) and f(t) = 1, we recover the model of (15.7). In terms of the 
formulation (15.7) the intrinsic rate of increase r has been absorbed into the time 
scale. 

The solution of (15.7) by means of the I-integral, is a diffusion on (0, 00) with 
infinitesimal coefficients 


u(x) = x(a- x) and — o6%(x) = y?x?. (15.8) 
The S-solution of (15.7) has the diffusion parameters 


u(x) = x(a + 4? — x) and  ø?(x)= y°x?. (15.9) 
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Differences in the behavior of the sample path of the two processes (15.8) as 
against (15.9) are significant. For example, in the case (15.9) by the criteria of 
Sections 5 and 6 we find a stationary probability density (a gamma distribution) 
always exists having expected population size E[N(co)] = «and var[N(oo)] = 
4ay?, while the existence of a stationary density for the process associated with 
(15.8) requires 2«/y? > 1. The classification of the boundary point 0 (yielding 
criteria for and the nature of population extinction) also differs between the 
two models. 


B. GENE FREQUENCY FLUCTUATION 


Consider a large population composed of two types A, and A, and let p(t) 
be the fraction of type A, at time t. If the relative intrinsic selective advantage of 
type A, over A, is the ratio 1 + s to 1, then a classical deterministic equation 
describing the changes in gene frequency has the form 


d 
= = sp(1 — p). (15.10) 


The rational for (15.10) is based on the following discrete time model. Let p, 
be the frequency of A, in the nth generation, where generations are spaced At 
time units apart. As a result of natural selection forces, the relative proportions 


of A, to A, in the (n + 1)st generation are (1 + s At)p, to 1 — p,. Converting 
back to frequencies, we obtain 


(1 + s At)p, _ Pal + 5 At) 
(1+sAtp,+(1—p,) 1+ p,s At’ 


(15.11) 


Pn+1 = 


If At is small, allowing us to neglect orders of (At)?, we obtain 
2 (1 FS At)pn _ = Pa(l = Pn)S At 
~(1+sAtp,+1—p, ‘" 1+ p,s At 

= p,(1 — p,)s At. (15.12) 


Ap = Pn+1 — Pn 


Dividing by At and letting At shrink to zero leads to the differential equation 
of (15.10). 

If the selection differential s dt is assumed to be a random process, then 
(15.10) can be regarded as a stochastic differential equation 


dp(t) = p(t)(1 — p(t) dB(t) (15.13) 


where s dt is replaced by the white noise differential dB(t) whose expected 
displacement and variance over the duration ¢ to t + dt are, respectively, 


EdB] = adt and — E[(dB(t))?] = v? dt. (15.14) 
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The solution p(t) of (15.13), where dB(t)/dt is white noise satisfying (15.14), 
conforms in the Ito sense to a diffusion process with coefficients 


u(x) =ax(1—x) and o%(x)=v0?x*(1—x)*? for O<x<1. 
(15.15) 


Consider the zero drift case wherein a = 0. Then the points 0 and 1 of the 
diffusion engendered by (15.15) are attracting but unattainable boundaries, 
and therefore cannot be reached in finite time. Moreover, the realizations of the 
frequency process exhibit persistent oscillations such that there is substantial 
time spent in the vicinity of 0 and 1. It should be emphasized that almost all 
sample paths of the process oscillate back and forth between 0 and 1 infinitely 
often. 

The preceeding analysis derived a deterministic differential equation and 
then formed its stochastic analog. To consider another approach, we write (15.12) 
in the form 


PEO — p(t)) ASE) 
1 + p(t) AS(t) 


where the selection process {S(t)} is a diffusion process satisfying 
E[AS] = a At + o(At) and — E[(AS)?] = v? At + o(Ad). 
Because the term involving (AS)? is not negligible, we calculate 
E[Ap] = p(l — pa — °p) At and E[(Ap)”] = p*(1 — pt’ At. 


The gene frequency process now is described by a diffusion process whose 
coefficients are 


Ap(t) = 


= pL — p(t) AS@[1 — pA) ASE) — ---], 


u(x) = x(1 — x(a —v?x) and o%(x)=0?x7(1—x)? for O0<x<1. 
(15.16) 


Notice that the variance coefficient v? now contributes to the drift term for the 
gene frequency process. 


SOME STOCHASTIC DIFFERENTIAL MODELS OF ECONOMIC PROCESSES 


During this past decade there has been increasing effort to describe various 
facets of dynamic economic interactions with the help of stochastic differential 
processes. There are two aspects to this approach. Traditional mathematical 
economics modeling focuses on transient and equilibrium interrelationships 
among production and consumption factors. Stochastic differential processes 
provide a mechanism to incorporate the influences associated with random- 
ness, uncertainties, and risk factors operating with respect to various economic 
units (stock prices, labor force, technology variables, etc.). Secondly, modeling 
the stochastic fluctuations in terms of continuous stochastic differential equations 
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(rather than in discrete time) often lends itself to explicit analysis with 
concomitantly easier interpretation and screening for sensitivity to the param- 
eters. 

Stochastic differential equation processes have been introduced in the study 
of three principal categories of economic phenomena: (a) description of growth 
of certain factors under uncertainty, (b) the nature of option price variations 
preserving certain market conditions, and (c) stochastic dynamic programming and 
control objectives. 

We shall present one or two typical examples for each category. 


C. PRODUCTION AND CONSUMPTION VARIATION UNDER UNCERTAINTY IN A ONE-SECTOR 
ECONOMY 


The variables of the model are as follows. 


(a) K(t) is the capital assets at time t. 

(b) L(t) is the labor force available at time t and is assumed proportional 
to population size. 

(c) C(t) is the consumption rate at time t. 


A production function F(K, L) is postulated obeying appropriate conditions. 
In many situations a set of natural requirements is that F is concave and 
homogeneous with respect to K and L. The example F(K, L) = K*L/~%, 
0 <a < 1, isaclassical choice known as the Cobb-Douglas production function. 
The capital goods accumulation equation is expressed as 

~ = F(K(0), L(t) — AK() — C(t), (15.17) 
where 4 is the rate of depreciation of capital (assumed to be nonnegative and 
constant) and C(t) is the aggregate rate of consumption. Relation (15.17) is 
locally certain and dK/dt is instantaneously determined when K, L, and C 
are known. 

In some situations it is natural to postulate that population size, or equiva- 
lently L(t), is subject to random perturbations. Branching and certain related 
limiting diffusion processes are the main processes used to describe the changes 
in L(t). The most common model of growth [cf. (15.3)] stipulates that L satisfies 
the stochastic differential equation 


dL{t) = L(t)[a dt + o dB(t)] (15.18) 
or, in accordance with (15.5), 
LD/L(0) = ef - 97/2 + one) (15.19) 


is geometric Brownian motion with an exponential drift. As developed in 
Chapter 7, Section 4, L()/L(0) follows a log-normal distribution with moments 


2 
rflow ao = ( z g) and Var| ow l = g?l. (15.20) 
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With the dynamics of labor size established according to (15.19) we can work 
out the dynamics for the capital accumulation. To this end, it is judicious to 
work in terms of the per capita variables. Accordingly, consider 


k(t) = i = capital—labor ratio, 
O _ Ct) E , (15.21) 
ct) = LO per capıta consumption. 


Because F is homogeneous (of order 1), we have F(AK, AL) = AF(K, L), and 
we may introduce the per capita output function 


F(K, L) 


f(k) = F(k, 1) = Ll k = K/L. (15.22) 
We further introduce the variable 
c(t) f : 
s(t) =1—- savings per unit output 
(t) Fk) ( gs p put) 


in which we allow s(t) to be positive (real savings) or negative (debts). In order 
to achieve a one-dimensional diffusion we shall assume in this model that s(t) = s 
is a constant independent of the rate of output. Equivalently, the gross consump- 
tion rate per capita is a fixed fraction of the gross capital output. 

Consider now the function of the two variables L and K 


G = GUL, K) = x =k, (15.23) 


where K(t) is the function obeying locally the deterministic equation (15.17) 
when L and C are known. Where L satisfies the stochastic differential equation 
(15.18), the Ito transformation formula (14.26) gives 


dG(t) = (G, + GLa + 4G,,L7o7) dt + G Lo dB(t). (15.24) 

(Of course, G, = 6G/dt, G, = OG/AL, Gy, = 67G/OL?.) 

A direct calculation gives 
G= aon 
ED). Ge 
L L 
= f(k) — AK(t) — c(t) [see (15.21)] 
= s(t)f (k(t) — Akl) 
= sf (k(t) — Ak(t) 


C 
-> 15.1 
g 90517 
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by the assumption that s(t) = s is constant. From the definition 


K 
GL = — T L = —k. 
Observe also that 
K k(t) 
u =T 


Combining the preceding into (15.24) we get the diffusion 
dk = [sf (k) — Ak] dt — ak dt + ko? dt — ok dB. 
Thus, the diffusion governing k(t) has infinitesimal parameters 


u(k) = sf(k) —(A+a—0%)k and  o?(k)=0?k?. (15.25) 


D. MODEL OF OPTION (WARRANT) PRICES 


Assume the stock prices vary stochastically in accordance with the stochastic 
differential equation 


dS = S[u dt + o dB(t)). (15.26) 


The warrant price (the price of the right to buy a stock over an appropriate 
time horizon) is assumed to be a function of the form 


W = F(S, 1), (15.27) 


where S is the current stock price and t is the current expiration time in which 
to exercise the option. The policy sought is to determine the function F(S, t) in a 
manner so as to eliminate any uncertainty (risk) of the value of the option price. 
The total investment J is assumed to be allocated among warrants and stocks as 


I = F(S, 1) + aS, (15.28) 


where « units of stock is equivalent to 1 unit of a warrant. Applying the Ito 
transformation formula to (15.27), we have (since dt = —dt, as time is measured 
back from the instant that the option expires) 


dW = (FsuS — F, + 407S?Fss) dt + FsoS dB. 
From (15.26) we have 
adS = «Su dt + aSo dB. (15.29) 
The total rate of investment return (15.28) behaves actording to : 
dl = dW + «dS 
= [(a + Fy)uS ~ F, + 4a7S?F yy] dt + (a+ Fy)oS dB (15.30) 
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whose risk term (i.e., coefficient of dB) is (x + Fs)oS. We determine 
a= -Fs (15.31) 


to eliminate the risk. Then the expected income per unit time is the coefficient 
of dt, namely, 


2F ssS°o? — F,, 


which, in order to keep a stable market, should equal the total return rI from 
secure investments. Thus, the available income for investment would yield rI and 
by the portfolio policy of (15.28) should equal r[F + aS] = r[F — FsS] since 
æ is determined by (15.31) for reasons indicated before. Under these market 
conditions the resulting equation for F becomes 


4F 55S?o? — F, = r[F — FsS] or F, = $8’? F ss + rSFs — rF. 
(15.32) 
The initial condition having S stock units available is 
F(S, 0) = max(0, S — a) 
where a is the fixed cost of exercising the purchase of the warrant. 


We recognize (15.32) as the backward equation for a diffusion with in- 
finitesimal parameters 


WS) = 7S, 0(S) = S?o, 
and killing rate function k(S) = r. The diffusion on 0 < S < œ with 
HS) = rS, o7(S) = 07S? (15.33) 


is recognized as geometric Brownian motion (cf. Example D of Section 2). 
Accordingly, we can represent the solution of (15.32) as the probability expecta- 
tion of the Kac functional [cf. (5.38) and (5.40)] 

F(S, t) = Es[e~ "h(X(t))], where h(x) = max(0, x — a) 


with respect to X(t), the diffusion process corresponding to (15.33). (Problem 
32 asks for the explicit solution.) 


E. AN OPTIMAL GROWTH MODEL 


Let the output process be 
X(t) = F(K(), LE), Z) (15.34) 


where K(t) is the capital, L(t) is labor, and Z(t) is the technology at time t. 

A classical specification takes F(K, L, Z) = K®L'~*o(Z) the so-called Cobb- 

Douglas form. The following relations are postulated: 

dK(t) 
t 


a X(t) — C(t) — 6K(t) (cumulation equation), (15.35) 


15. SOME STOCHASTIC DIFFERENTIAL EQUATION MODELS 367 


where the rate of change of capital is proportional to production output minus 
consumption minus depreciation of the capital goods at rate 6. If population 
growth is assumed constant then 


Li 
a) = rLit). (15.36) 
dt 
The change in technologies is assumed to vary stochastically according 
to the differential equation 
dZ(t) = n(Z) dt + o(Z) dB(t). (15.37) 


Let U(C(t)) be the utility derived per individual from a rate of consumption 
C(t) at time t. The problem then is to determine C(t) subject to (15.34)-(15.37) 
which maximizes the total, properly discounted, expected utility: 


e| i “ue” al (15.38) 
0 


This is a typical stochastic control problem of interest in economic theory. 


F. ANOTHER OPTIMAL STOCHASTIC ALLOCATION MODEL 
Let the assets at time t be A, and consist of two parts, those based in secure bonds 
(as in banks) B, and the investment in equities (e.g., stocks) K,. Then 

A, = Bp, + Kiq, (15.39) 


where p, is the price of a unit bond and q, the price of an equity. We assume as a 
first approximation that q, fluctuates according to 


aq. =ndt+odB (15.40a) 


t 
with a random component, while 


OD aoa (15.40b) 


t 


involves no random component. 

The problem is to choose B,, K,, and C, (C, is the part of the holdings put 
into consumption) to optimize an appropriate utility. We stipulate the dynamic 
stochastic equation 


dA, = B, dp, + K, dq, — C, dt, 
and substituting from (15.40), 
dA, = Afar + (1 =ar] dt + (1 = ajo dB} — C, dt, 
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where by definition « = B,p,/A,. In this set-up A, (wealth) is given at every 
instance and the problem is to choose « and C, to maximize the utility expectation 


el |" U(C e~” a|, 
0 


G. A SIMPLE SIGNAL DETECTION MODEL 


It is common practice to model various electrical dynamical systems by means 
of stochastic differential equations. The underlying signal process is assumed to 
evolve following 

dX = F(t)X(t) dt + G(t) dB(t) (15.41) 
for the linear model and 

dX = f(X, t) dt + o(X, t) dB (15.42) 


for the nonlinear model. Usually (15.41) and (15.42) are vector processes. The 
observable processes are governed by the process 


dZ = H(t)X dt + R(t) dB). (15.43) 


where B and £ are independent Brownian motions. Observing Z(t) at a finite 
number of time points the objective is to obtain various statistical estimates of 
the X(t) process and thereby extrapolate, interpolate, or appropriately smoothen 
the observations. There is an elaborate development in the communications 
literature on these models, e.g., consult Wong [10]. 


16: A Preview of Stochastic Differential Equations 
and Stochastic Integrals 


We devote this section to highlighting a number of important facts of stochastic 
differential equations without proof. This subject is very much in the modern 
spirit for both theoretical and applied investigations of stochastic phenomena. 
The solutions of stochastic differential equations [driven by white noise 
W(t) = dB/dt] are constructed in terms of stochastic integrals of the form 


| “£(2) dBO), (16.1) 
0 


where the f(t) are suitable random functions determined from the process 
values {B(s), s < t}. In attaching meaning to (16.1) it is impossible to employ 
the standard calculus of integrals because almost every sample path of B(t) is 
not of bounded variation. Stochastic integrals of the form (16.1) in which f(z) is 
not random but deterministic were covered in Section 8 of Chapter 9. 
Proceeding more exactly, the class of admissible functions f(t) considered in 
(16.1) will be characterized in terms of the family of o-fields .F,, t > 0}, where 
¥, consists of all events determined by the realizations of B(s) over the time 


16. STOCHASTIC DIFFERENTIAL EQUATIONS AND INTEGRALS 369 


interval 0 < s < t. (An elaborate discussion of such collections of o-fields is 
given in Chapter 6, Section 8.) Accordingly, two realizations are identical in F, 
if their paths coincide until time t, although they can differ afterward. Mathe- 
matically, F, is the smallest o-field generated by the family of random variables 
{B(s), 0 < s < t}. (Recall the concept of a o-field generated by a random variable, 
Chapter 6, Section 7.) 

Let œ signify a particular realization of the process. We write f(t) = f(t, œ) 
to denote a real-valued random function, œ serving to emphasize, where desirable, 
that f(t) is random. We say that f(t) is progressively measurable? if for every 
t > 0 and real numbers a < b the set {(t, œ):0 < t < tanda < f(t, œ) < b} 
is in the product o-field B[0, t] © F,, where B[0, t] denotes the Borel subsets 
of [0, t]. Intuitively, the value of f(t) is determined by the Brownian path up 
to time t and not by future values. 

In order to define (16.1) we shall further require 


| "EN f(D} 1 dt < 00. (16.2) 
0 


The collection of all progressively measurable f obeying (16.2) is designated by 
KH (0, T). 

We shall define the integral (16.1) in such a manner that it possesses the 
following properties: 


Gi) Iff, and f, belongs to #(0, T), then 


T T T 
f La fit) + az f2(t)] aB(t) = a, fro dB(t) + a fro dB(t). (16.3) 


Of course, this is to be understood as a statement of equality between random 
variables. 
(ii) If[«, B] = [0, T] and 


1 a<t<f 
=<” i 16.4 
Iia p(t) 10 otherwise, ue) 
then 
T 
| Tix, p(t) dB(t) = B(B) — B(a). 
o 
(iii) Iff(t) belongs to # (0, T), then 
T 
e| f(t) aso] =0 (16.5) 
o 
and 
T 2 T : 
el rodo} | = f E[S ()}] de. (16.6) 
0 0 
+The assumption that f(D is nonantieipative or adapted to F, is often seen, In fact, the stronger 


assumption that /(, e) i8 progressively meusurable is needed. 
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With (i)-(iii) in mind, we define (16.1) first in the case that f(t) is a finite 
step function in #(0, T), that is, there exists a subdivision of [0, T], 0 = 
to < ti <-> < t, = T, such that 


f@® = f(t) on yat tgp i=0,1,2,...,n—1. (16.7) 


The subclass of all finite step functions of #(0, T) is denoted #5(0, T). For 
f in #(0, T) (as in (16.7)), we define 


T n-1 
I(f) = | f(t) dB) = >) f(t) LBs 1) — BED]. (16.8) 
0 i=0 
The indefinite random integral 4(s; f) [also written as £ (s) = %,(s, w)], 
0 < s < T, is defined for f in #,(0, T) and is delineated as in (16.8) with 


OQ; Ts; 


0 oe (16.8a) 


f(t) replacedby = f(t) = i 


The construction of (f) and the indefinite integral is endowed with the 
properties listed in the next lemma. 


Lemma 16.1. For f and g in # (0, T], we have 


(a) E[.%(f)] = 0, and 
(b) ELICA] = fo ELSJE] at. (16.9) 


The basic theorem is now stated. 


Theorem 16.1. Consider the process generated by the indefinite integral of an 
element f of # (0, T) according to 


Y(t) = F(t) = i f(t) dB), =0<t<T. (16.10) 


The process {Y(t), 0 < t < T} is a continuous time martingale with respect to 
F , having continuous sample paths. 


The stochastic integral %(f) can be extended to all functions f(t) = f(t.) 
defined for 0 < t < T, progressively measurable with respect to the family of 
o-fields F,, and obeying the integrability condition (16.2). We have denoted the 
collection of all such functions by #(0, T). An elaborate approximation 
procedure founded on the class of stochastic step functions #%;(0, T) using 
more advanced real-variable analysis allows us to extend Theorem 16.1 as 
follows: 
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Theorem 16.2. Let f be in A(0, T]. Then there exists a square integrable 
martingale process { X(t), t > 0}, for which E[X(t)?] < œ for allt > 0, and such 
that 


X(t) = F(t) = I (t, ©) = f T w) dB(t) 


satisfies the properties (16.9). Further, X(t) has almost all its sample paths con- 
tinuous. 


A useful comparison of the Ito integral with ordinary Riemann integration 
is valid for certain classes of smooth random functions. 


Theorem 16.3. Suppose f(t) is in X (0, T) and is continuous as a function of t 
with probability 1. Consider a sequence of partitions 


O= P< PpP <---<M=T 
with maximum mesh size 6, = Max, <j<, (t — t). Then 
T n 
I(f) = [ ff) dB) = lim X fP DEBE) — BEL). 
0 6n70,n>00 i=1 


The collection of progressively measurable functions g(t, œ), integrable in that 
T 

| |@(t, w)| dt < œ for almost every realization @, (16.11) 
0 


is denoted by (0, t). 
Let y be in #(0, T) and p be in (0, t). Consider the process 


X() = | modes | Wa) dBQ), CSST (16.12) 
0 0 


which exhibits almost surely continuous paths. It is called a semimartingale 
because the second integral is a bona fide martingale (Theorem 16.2) and the 
first integral is almost always a function of bounded variation. 

It is convenient for later purposes to represent the process X(t) in the dif- 
ferential notation 


dX(t) = Y(t) dt + W(t) dB). (16.13) 


THE ITO TRANSFORMATION FORMULA 


The statement of the Ito transformation formula, some background material, 
and its formal operation are highlighted in Section 14. To ease the exposition 
(without omitting any essential ideas) we assume at first that F(x, 1) = F(x) 
is a function of a single real variable and is twice continuously differentiable. 
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Consider the semimartingale X(t) defined in (16.12). Subject to mild re- 
strictions on F(x) the simplest version of the Ito transformation formula has the 
expression 


dF(X(t)) = [F(X()) + IF XOW O] dt + F(XO)WO dBA). (16.14) 
The equivalent integrated form is 


F(X) — F(X) = [cr (X(t) (0) + F'(X C] dt 


t a 
+ | F'(X(t))W(t) dB(2). (16.15) 
0 
The precise statement is the following: 


Theorem 16.4. Let p(t, œw) be in LO, T) and W(t, œw) be in H(0, T). Assume 
F”(x) is bounded, and F'(x) and F(x) continuous. Then the Ito transformation 
formula (16.15) holds. 


EXTENSIONS 


Subject to the stipulations of Theorem 16.4 plus the condition that F(x, t) = 
(OF /dt)(x, t) is continuous and F (X(t), t)e L(0, T), where X(t) is prescribed in 
(16.13), we have 


F(X(0), t) — F(X(0), 0) = [ra (t), 1) + FX), DEC) 
+ $F (X(C), DY C] dt 
+ frao, T)Y(T) dB(t) (16.16) 
or in differential notation i 
dF(X(t), t) = [F(X 0), t) + FAXO), Do) + FXO, DPO] dt 


+ F,(X(t), DYO) dB(£). (16.17) 


There is a multivariable version of (16.16) which we now state for complete- 
ness. Consider the collection of processes 


dX(t) = oft, œ) dt + Yt, œ) dB(t), PS 12 eh 


and impose on F(t, x,,X2,...,X,) the natural smoothness and integrability 
conditions consonant with those of Theorem 16.4. Then 


dF(t, X(t), iaae X,(t)) 
OF " (OF K 2 
= E $ > (ot. wo) +4 2 (pe oy (t, »| dt 


i=1 i,j Ox; ôx; 


+ [è = Wilt, «| dB(t), (16.18) 
iw 1 OX 
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the aoa of the partials being 


OF _ @?F OF 
X(t). nlf). 
E at X,(t),...,X,(t)) and A ESF — (t, X(t) X,(t)) 


EXISTENCE AND UNIQUENESS OF SOLUTIONS FOR STOCHASTIC DIFFERENTIAL EQUATION 


We elaborate a number of conditions guaranteeing existence and uniqueness 
for solutions of the stochastic differential equation (16.13) where y(t, œ) = 
OAX(t, œ), t) and W(t, œ) = W(X(t, œ), t). With this refinement, (16.13) then 
becomes 

dX(t) = p( X(t), t) dt + W(X(O, t) dB, (16.19) 


equivalent to the integral equation 


X(t) = X(0) + | ARO): t) dt + [vce T) dB(t). (16.20) 
. 0 0 


On this matter we should anticipate the usual restrictions imposed on ¢(x, t) 
and w(x, t) arising in ‘the ordinary differential equation framework and, per- 
haps, more stringent conditions for the stochastic setting. The customary 
stipulations are incorporated in paragraphs (a) and (b) that follow. 


(a) Growth Condition 


There exists a constant K independent of 0 < t < T and — œ < x < œ such 
that 
(x, t) + W(t, x) < K( + x?), -0 <x <0, (16.21) 


Thus, the rate of growth of ọ and w is at most linear in x as |x| > œ. This 
restriction is crucial for otherwise the solutions of (16.19) [that is, the process 
realizations of X(t)] can reach infinity in finite time with positive probability. 
Such occurrences are reminiscent of sample path behavior in birth processes, 
where with quadratic growth in the birth parameters the process X(t) can attain 
an infinite boundary in finite time with positive probability (consult Chapter 
4). 

Again, in the context of ordinary differential equations some kind of growth 
restriction is essential in order to be assured that the solution can be continued 
for the total time horizon0O < t < T without becoming infinite at an intermediate 
time point. 

Clearly, (16.21) applies to the circumstance 


(A(x, t) = xg(t), (16.22) 
with g(t) bounded for 0 < t < T. 


(b) Lipschitz Conditions 


There exists a constant L independent of 0 <1 T, and of x, =% < x, 
y < 0, such that 


lax D = o DL u D = uvo Dla LI me. 11023. 
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The necessity of a Lipschitz condition in guaranteeing uniqueness for solutions 
of ordinary differential systems is classical. 


We now state the first existence theorem. 


Theorem 16.5. Let p and y satisfy conditions (16.21) and (16.23). Let X(0) 
be prescribed obeying E[{X(0)}7] < œ. Then there exists a unique solution of 
(16.20) as a continuous process. 


In order to deal with (16.20) the method of successive approximation 
traditional in constructing a solution for a standard differential equation system 
is appropriately adapted. Extensive refinements are available (e.g., see Gikhman 
and Skorohod III [6, 7]). 


SOME APPLICATIONS OF THE ITO TRANSFORMATION FORMULA 


An Exponential Example 
Let X(t) be a solution to 
dX(t) = p(t) dt + Y(t) dB), (16.24) 


where ¢ is in £(0, T) and y is in #(0, T), and assume throughout this section 
that X(0) = x, is constant. Consider F(x) = e”. Observe that F(x) = F'(x) = 
F"(x). If 
g(t,@) and e*%“%o(t,w)arein Z0, T), (16.25a) 
while 
Wt,@) and = et, w) are of class #(0, T),  (16.25b) 


then formula (16.15) gives for Y(t) = e* the stochastic differential 
dY(t) = Y(t){Lo() + WE] dt + WO) LBO). (16.26) 


Except for elementary variants on Brownian motion the conditions (16.25) 
are often formidable to verify or, on occasion, may not apply. To overcome these 
problems a truncation procedure is commonly implemented. 

Consider now the specialized process Z(t) expressed in differential form by 


dZ(t) = —4W7(t) dt + W(t) dB(t) 


and in integral form by 


t 


Z(t) — Z(0) = — ; [ro dt + [ve dB(t). 


0 


Then, on account of the Ito transformation formula and (16.26), R(t) = 
exp[Z(t)] satisfies the stochastic differential equation 


dR(t) = R()W(t) dB), (16.27) 
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which entails 
R(t) — R(O) = f ROC) dB(t), (16.28) 
0 


and thé right-hand side should represent a continuous square integrable 
martingale with respect to {¥,} by Theorem 16.2. 

It is useful to extend (16.28), introducing a real parameter 4. Accordingly, 
with y in #(0, T), then 


R(t; A) = exp| -32 [vo dt +A [vo 4a) 
0 0 


constitutes a square integrable martingale process. The formal justifications 
are quite formidable since a hierarchy of integrability properties need to be 
checked. 

The special choice Y(t, œ) = 1(aconstant) produces the family of martingale 
processes 


R(t; A) as e7472 + ABO) 


already encountered in Eq. (5.1) in Chapter 7. 
Parenthetically, the solution of 


dR(t) = R(t) dB(t) 
in the Ito sense is the martingale process 
R(t) = R(O)e3O- 1/2 


and not the solution obtained by integrating dR(t)/R(t) = dB(t) along each 
sample path. 


THE DIFFUSION COEFFICIENTS FOR A STOCHASTIC DIFFERENTIAL EQUATION 


A diffusion process is characterized through its infinitesimal conditional mean 
displacement and infinitesimal variance coefficient determined by 


lim | ELX(t +h) — X()|X@® = x] = wx, t), 
nyo h 


lim ` E[{X(t + h) — XOP IXO) = x] = o?(x, t), (16.29) 
Alo 


and for any £ > 0, 


i 


lim 7 Pr{(]X( +h) — X(t)| > el X(t) = x} = 90, (16.30) 
hlo 


the convergence is uniform with respect to x and / confined to a finite segment. 


376 15. DIFFUSION PROCESSES 


A more accurate formulation replaces (16.29). It involves truncated moments 
endowed with the same limiting relations and circumvents the need to postulate 
the existence of moments. In most practical applications of diffusions the con- 
ditions (16.29) plus (16.30) directly apply. 

As pointed out in Section 1, property (16.30) is implied by the moment 
inequality 


E[|X(t + h) — XHP IXO = E] < Mh? (16.31) 


for ô and y positive where M(¢) is bounded over any compact region of €. 
Consider the diffusion process {X(t), t > 0} that solves the stochastic 

differential equations (16.19) concommitant with the smoothness restrictions of 

(16.21) and (16.23). Its infinitesimal coefficients are described in the next theorem. 


Theorem 16.6. Let {X(t), t > 0} be the diffusion that arises as a solution of the 
stochastic differential equation (16.19), where the coefficients p(x, t) and W(x, t) 
satisfy the growth and smoothness conditions (16.21) and (16.23), respectively. 
Then 


lim : E[(X(t + h) — X(t)|X(t) = x] = (x, t, (16.32) 
h} 0 
lim ELX +h) — X()}7|X() = x] = W(x, 0, (16.33) 
hjo 


and the limits exist uniformly for t and x ranging over any bounded region. More- 
over, 


E[{X(t + h) — XOF IXA = x] < WCC, x), (16.34) 


where C(t, x) is uniformly bounded for t and x traversing a bounded region. 


A FAMILY OF MARTINGALES 
Let X(t) solve uniquely the stochastic differential equation 
dX(t) = (X(t), t) dt + W(X(t), t) dB(t) with X(0) = xo, (16.35) 


where p and y fulfill the growth and Lipschitz conditions (16.21) and (16.23), 
respectively. 
We can write (16.35) equivalently in the form 


X(t) — X(0) = [ocx (t), tT) dt + [vex (t), t) dB(t). (16.36) 
0 0 


We now form the process 


Z(t) = AX(t) —4 [exe t) dt — 44? frao. t)dt (16.37) 
0 o 
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with / a fixed real parameter, t > 0, such that 
dZ i(t) = AdX(t) — AOX (0), t) + 2A7>W7(X(0), Dl de 
—FA2W?(X(0), t) dt + AW(X(0), t) dB(). (16.38) 


It is possible to apply the Ito formula (16.16) for F(x) = e” [cf. the discussion 
of (16.26)-(16.28)] using recondite truncation arguments to obtain the following 
important theorem. 


Theorem 16.7. Let (x,t) and w(x, t) satisfy the conditions of Theorem 16.5 
and suppose X(t) is the unique diffusion process solving (16.35) with coefficients 
~(x, t) and W(x, t). Then for each real A, the process 


t t 
U,(t) = exp] AX(0 —A foxo, t) dt — 4A? frao, 1) a| (16.39) 
0 0 
constitutes a continuous path square integrable martingale. 


There is a converse to Theorem 16.7 known as the martingale property 
characterization of a diffusion process. It is delicate and technical and can be 
stated as follows. Under suitable regularity conditions on g(x, t) and w(x, t), 
if U,(t) defined in (16.39) constitute a continuous path martingale for each real A 
then {X(t), t > 0} constitutes a diffusion process with mean infinitesimal dis- 
placement (x, t) and infinitesimal diffusion coefficient w(x, t). 

With respect to Sections 14-16 we encourage the reader to tackle Problems 
36-38. 


Elementary Problems 


1. Calculate the covariance K(s, t) = E[Y(s)Y(t)| Y(O) = 0] for an Ornstein-Uhlen- 
beck process having infinitesimal parameters 
uy) = — Py and o*(y) = 1. 
Solution: K(s, t) = e~ ®*~%(1 — e7 ?*5)/28, for s < t. 
2. Consider a diffusion process on the interval [0, 1] in natural scale [s(¢) = 1] and 


with a speed density m(&) which is continuous and positive. Determine the diffusion 
coefficients u(x) and o7(x). 


3. Let {B(t)} be a standard Brownian motion and consider the transformation ob- 
tained from the function f(x, t) =x + t. Then Y(t) = f(X(0), t) is a diffusion. Identify it. 

4. Consider a regular diffusion X(t) on (0,1) with variance coefficient oe) = 
x?(1 — x)?. Show that the diffusion 


Y(1) = log Sa 


on(—#, +) has a constant infinitesimal variance. 
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5. If X(t) is a diffusion process on (0, 00) with infinitesimal parameters u(x) = bx + c, 
c > 0, and o?(x) = 4x, what are the infinitesimal parameters of the diffusion process 


Y(t) = ./X(t)? 
Solution: py(y) = (by? + c — 1)/2y, o%(y) = 1. 


6. Let X(t) be a diffusion process on (0, 00) with diffusion coefficients u(x) = cx and 
o*(x) = x*, with c > 0 and « # 2. Consider the diffusion Y(t) = [X(t)]’. What choice 
of $ will give a constant infinitesimal variance for Y(t)? 


Solution: B = —(a — 1). 


7. Consider a diffusion process X(t) on [0, œ) for which u(x) = ux and o?(x) = a? 
for fixed positive parameters u and o°. The left boundary 0 is prescribed to be an 
absorbing state. In the financial analysis literature, this process is sometimes called a 
compounded Brownian motion, where X(t) can be interpreted as assets which increase 
through interest earnings at a mean rate u but perturbed by Brownian motion fluctua- 
tions. (i) Verify that 0 is a regular boundary, consonant with the prescription of 0 as an 
absorbing state. (ii) Find the probability u(x) of ruin, that is, the probability that the 
process reaches 0 from an initial level x > 0. 


8. Consider the infinitesimal parameters 
(x) — Ox for x > 0, 
x)= 
j — Bx for x <0, 


o?(x) = 1 on I = (— œ, +) where «, 8 > 0. Compute the scale and speed functions 
s(€), m(€) and the stationary density (é). 


9. A quite general diffusion population growth model is characterized by the infinitesi- 
mal coefficients 


wy) = ay, o(y)=ty+ oy, I= (0, 0). 


Classify the boundary 0 under various assumptions on t > Oand@w > 0 witht + œ > 0 
and — œ < & < 0. 


10. Consider a diffusion process on [0, 00) with infinitesimal drift u(y) = ay and 
variance o°(y) = By with «, B > 0. What is the probability of absorption at 0? 


11. Consider a diffusion on [0, 1] with infinitesimal coefficients 
u(x) = (v — r1 — 2x)x(1 — x), o?(x) = [1 + 2% — r)x(1 — X] — x)x, 
0<I\rl<v. 


Let E*(x) be the expected time to absorption at the boundaries 0 or 1 starting from 
X(0) = x. 
Establish the formula 


E*(x) = —2 f inl, Fe 


with s(x) = 1/L1 + 2(v — r)x(1 — x). 
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12. Consider a standard Brownian motion {B(t)} and let t be the random time that the 
linear barrier at + bis first reached by B(t). Show that E[e °*] = exp{— bl. /a? + 20 + al}. 


Hint: Use the martingale 
Z,(t) = exp{AB(t) — $472}, -—o<A<o. 


13. Let {X(t)}} be an Ornstein-Uhlenbeck process with infinitesimal parameters 
u(x) = —ax and o?(x) = 1, « > 0. Show that the infinitesimal parameters of the diffusion 
process defined by Y(t) = e* are u(x) = 4x — ax log x and 07(x) = x?. 


14. Let {X(t)} be reflected Brownian motion on [0, co). Show that the infinitesimal 
parameters for Y(t) = X(0)/[1 + X] are u(y) = (y — 1} and o?(y) = (y — 1)4. 


15. Let {U(t)} be an Ornstein-Uhlenbeck process with infinitesimal parameters 
u(x) = — yx and o*(x) = 1. Show that X(t) = |U(t)| defines a diffusion process and 
determine the infinitesimal parameters. What should the boundary condition at the origin 
be? 


16. Let {X(t)} be a time inhomogeneous diffusion process with drift coefficient p(x, t) and 
diffusion coefficient o?(x, t). Let u(x, t) be the probability of hitting b before a starting 
from X(t) = x. Formally, 


u(x, t) = Pr{ X(T) = b|X@) =x}, a<x<b, 


where T = inf{s > t, X(s) = aor X(s) = b}. 
(i) In the spirit of Section 3, establish that u(x, t) satisfies 
ou ôu 1 ôu 


Tin m u(x, t) zok = 0?(x, t) 


CE < b, 
at 3x 2 ax 15> 


and u(a, t) = 0, u(b, t) = 1. 
(ii) What is the corresponding differential equation for the mean time v(x, t) = 
ECT | X(t) = x]? f 


17. Let X(t) be a regular diffusion process on [a, b] with both boundaries exit. Derive 
a differential equation for 


P(x) = el | t"g(X(t)) dt| X(0) = x], n >l, 


0 


where T is the first passage time to the boundaries. 


Solution: w(x)p,(x) + i (xp) = —n@,-1(x), a < x < b, with 9,(a) = 9,(b) = 0, 
and 


T 


Polx) = W(x) = e| | g(X (Ð) dt| X(0) = <| 


0 
18. Let {X(t)} be a diffusion process with infinitesimal parameters u(x) and o7(x). 
Define Y(t) = F(X(1), 1) where F(x, 1) is smooth (possesses whatever derivatives are 
necessary) and OF /0x = I(x.) # 0 forall x and © Argue that { ¥()} is a (time inhomog- 
cneous) diffusion and determine the infinilesimal coefficients, 
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Solution: 
2 


ôF ôF” a OF 
My(y, t) aS “Ot (x, t) ag u(x) Ax (x, t) + 30 (x) @x2 (x, t), 


2 ôF k 2 
oxy, j= re (x, 1) | o°(x), 


where x is determined uniquely (why?) such that F(x, t) = y. 


19. Let {X(t)} be a diffusion process on (— œ, + 00) with infinitesimal parameters 


=], x >0, 
W(x) = —sgn(x) = 0, x=0, 
1, x <0, 


and o?(x) = 1 for all x. 

(i) Verify the scale functions s(é) = e7!§l, -ao<&<0o and S(x)= 
(sgn x)(e2!*! — 1). 

(ii) Show that 


e(x) = e771, 


-0 <xX< 0, 
is a stationary distribution for the process. 
20. Consider the diffusion on (0, 1) with infinitesimal parameters 
u(x) = x?(1 — xa — Bx”) and o? (x) = Bx*(1 — x). 
If 0 < « < 48, show that there exists a unique stationary density y(x) and compute it. 
Hint: Use Eq. (5.34) and show C, = 0. 


Solution: 


e” 2ax/p 


a = x)? Pyt 2018? 


W(x) =¢ 


0<x<il, 


where c is a normalizing constant. 


21. Let {X(t), —œ <t < œ} bea stationary Ornstein-Uhlenbeck process with in- 
finitesimal parameters u(x) = —x and o?(x) = 1. Characterize the diffusion process 


W(t) = /tXG In t). 

Solution: W(t) is standard Brownian motion. 

22. What is the transition density for geometric Brownian motion (see Example D, 
Section 2. 

Solution: 


1 = Z al where (z) = z exp(—43z”). 


yot a o/t V 27 


23. What is the probability that a standard Brownian motion in n dimensions starting 
at x = (Xis... Xp) with ||x|| = r, will achieve a magnitude r, before getting closer to 
the origin than r,;. Of course, ry < F < r2. 


p(t, x, y) 
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Hint: The problem reduces to calculating the scale function for the radial process 
{R(t)}, which is a one-dimensional diffusion on [0, 00) with infinitesimal parameters 
u(r) = (n — 1)/2r and o°(r) = 1. 


24. Show that the boundaries in the approximating diffusion for the Wright-Fisher 
random sampling model with mutation correspond to reflecting barriers (Section 2.F(a)). 
(The time at the boundary is negligible relative to the speed measure.) 


25. Consider the instantaneous return process Z(t) (Example E, Section 8) induced 
by a regular diffusion X(t) on (l, r) relative to an interval [a,b] c (l, r). Suppose 
that when X(t) first attains a or b it is returned to the interior of (a, b) according to a 
fixed probability density function f(x), a < x < b. What is the limiting (stationary) 
distribution of Z(t)? 


Solution: 
he f@G, y) dx 
PTE SOGE, n) dx] di’ 


where G is the Green function on [a, b]. 


p0) 


26. Let {X(t)} be a regular diffusion on [0, 1] with absorption at 0 and 1. The process 
{X*(t)} is derived by restarting the X(t) process at a if 0 is hit, and restarting at b if 1 
is hit. Find the stationary distribution a(x) of {X*(t)}. 


Solution: 
__ Ro(b)G(a, x) + T, (a)G(b, x) 
© fh [ro(b)G(a, © + m(aGOb, E] ae’ 


where G(y, x) is the Green function corresponding to {X(t)} on [0, 1] and 2,(x) = Pr{ X(t) 
hits i| X(0) = x}. 


a(x) 


27. Consider the gene frequency selection model (Section 2.F(b)). For the state x in 
(0, 1) the function I(x) = s(1 — x) is called the load of the fraction of deleterious gene, 
where s is the corresponding unit cost of carrying each such individual in the population. 
Compute the cumulative expected load until fixation 


L(x) = el | s(1 — X(t)) dt| X(0) = x} 
o 


where T is the fixation time (hitting time to 0 or 1). 


28. Consider a diffusion process on (0, œo) with infinitesimal parameters u(x) = cx“, 
a*(x) = dx’, a > 0, B > 0,00 < c < œ, and d > 0. Classify the boundaries 0 and 00 
in terms of a, p, c, and d. 


Solution: The classification involves the breaking down into cases and the calculation of 
integrals. Some results are that the 0 boundary is A 


Regular if <2 and idh —1)< e< 4d, 
Exit if pe2 and ee 4d- 1), p~a=l. 
intrance if f<e2 and c dd 
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Problems P 


1. A quality control model leads to a diffusion process {X(t)} on [0, œ) having the 
infinitesimal parameters p(x) = 1 + x and o?(x) = x?. (i) Show that O is an entrance 
boundary. (ii) Evaluate v(0) = E[T,| X(0) = 0] where 7, = inf{t > 0, X(t) = A}. 


Answer: v(0) = 2 få [f8 mE) dé]s(q) dy where m(é) = e™?* and s(n) = n~?e?/". This 
simplifies to v(0) = 2e?/*E*(2/A) where E*(x) is the exponential integral 


E(x) = | ute“ du. 


2. A certain population is modeled as a diffusion process {X(t)} on (0,0) with drift 
u(x) = cx, c fixed, and diffusion o?(x) = d?x. Observe that according to (1.14), 


t 


U,(t) = exp| axo — he [x0 ds — 4A? d? | X(s) as| 
o 


0 
is a martingale for every A > 0. Suppose c and d are such that absorption at 0 (extinction) 
occurs with probability 1. Let T be the time of extinction and Z = ff X(s) ds a measure 
of the aggregate population size. Determine the distribution of Z using the martingale 
U ,(t). 


3. Consider a time inhomogeneous diffusion process {X(t)} with diffusion coefficients 
u(x, t) = at and o(x, t) = Bt, « > 0, B > 0. Let T = T, be the first passage time to the 
level a and evaluate E[e~?""|X(0) = 0], A > 0. 


Hint: Use the martingale 


t t 


u(X(s), s) ds — 467 f a7(X(s), s) is} 6 real. 


10) 


Y,(t) = exp 0x0 —0 | 


0 


4. Consider an n-dimensional standard Brownian motion and let D be the closure of a 
bounded open set D in E" (Euclidean n-space). Let T be the first exit time from D. Argue 
by the informal methods of Section 3 that 


u(x) = E[f(XTŅIX0) =x], xinD 
satisfies 
8? ô? 
tAu=0 for xinD (a-ti) 
and 


u(x) = f(x) for xin ĉD, the boundary of D. 


5. Suppose that a diffusion {X(t)} with killing time ¢ has infinitesimal coefficients 
u(x) = 0,07(x) = x, and k(x) = 0x on the state space (0, 00). Show that v(x) = E[¢| X(0) 
= x] solves the differential equation 


dxv"(x) + Ov'(x) — Oxx) = —1, O<x< um, 


with the boundary condition v(x) | 0 as x | o. 


PROBLEMS ~ 383 


6. Consider a discrete generation population growth process following N,.,; = Z, Npn, 
n=0,1,..., where N, is the population size in generation n and {Z,} are independent 
identically distributed positive random variables with E[Z)] < oo and E[|In Zo|] < œ. 
Set u = E[In Z,)] and o? = var(In Zo). Determine in what sense N,, can be approximated 
by the process N, = No exp[ut + oB(t)], where {B(t)} is standard Brownian motion. 


7. For each integer N consider the discrete recursion 
o 


xg, — xh n Ti ri), XY 
N 


where {yM} are independent random variables with mean 0 and variance 1. Let X(t) = 


n 


X(N, Where [Nt] is the largest integer not exceeding Nt. Prove that X(t) > X(t) 


where log X(t) is a Brownian motion with drift « — $0? and variance parameter o°. 


Hint: Iteration produces 


Taking logarithms 
(Nd) 


(Ny _ 
In Xf, = ¥ 
Z 


=1 


NAT y o 1 0? 1 
= + — yM (N))2 +o x) 
le JNE IN nk’) N 


Now invoke the law of large numbers and the central limit theorem. 


m(ı + bel i seo n) 
N JN i 


8. Let {X(t)} be a process whose sample paths contain both deterministic and jump 
components. During the deterministic part, X(t) grows exponentially in the manner 
dX(t)/dt = « X(t). Random jumps also occur such that in the time period (t, t + h) then 
X(t) changes to X(t) with probability $4h + o(h), X(t) changes to (2 — P)X(t) with 
probability 34h + o(h), and no jump occurs with probability 1 — Ah + o(h). Consider 
the process X(t) = X(t; A, a, P), displaying the dependence on parameters. Let B > 1 
and Aå > œ such that A(B — 1)? >a > 0. Show that X(t) converges to a geometric 
Brownian motion with infinitesimal parameters u(x) = ax and o?(x) = ax?. 


Hint: Calculate E[((AX)"| X(t) = x] for n = 1, 2, and 4 where AX = X(t + h) — X(t). 


9. Consider a sequence of branching processes {Zy(t),t > 0} with initial size Zy(0) = N. 
The offspring per individual has probability generating function fy(s) (see Chapter 8) 
und moments 


A 1 
E[Zy(1)|Zy(0) = 1] = fy(1) = 1 + _ a o( i). 


var[Zy(D)/Zy(O) = = h, independent of N. 
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Consider the normalized process Y(t) = Zy([Nt])/N. Show that 


lim NE Et + ) — Yy(t)| YN) = J = ay, 


N> œ 


1 2 
tim na| fxh: + z) — nol 


and the fourth-moment displacement converges to zero. The calculations suggest that 
Yy(t) converges to a limiting diffusion on [0, œ) with infinitesimal parameters u(y) = ay 
and o?(y) = By. What assumptions on fy(s) insure that 


vl fel) 0 un] 


uniformly for y in bounded intervals? 
10. Consider a sequence of branching processes {Z y(t), t > 0} with initial size Zy(0) = 
N. The offspring per individual has probability generating function f,(s) and moments 


¥y(t) = J = By, 


1 
E{Zy(1)|Zy(0) = 1] = faQ) = 1 + oft), 


var[Zy(1)|Z,(0) = 1] = 1, independent of N, 


and the fourth central moment is o(1/N). Suppose that a random number of immigrants 
enter the population each generation at an average rate of c/N with variance o(1/N). 
By adapting the procedure of Problem 9 show formally that Yy(t)= N~'Zy([N¢]) can 
be approximated by a diffusion on [0, 0) with o?(x) = x and u(x) = c. 


11. Let {Y,(t)};- 1,2 be independent diffusion processes on I = (0, 00) having infinitesimal 
coefficients u(y) = c; > 0 and o7?(y) = y. Show that X(t) = Y,(t) + Y,(t) defines a 
diffusion process on (0, 00) for which py(x) = c, + c, and o%(x) = x. 


Hint: Establish the corresponding property for the branching processes of Problem 10 
and pass to the limit. 


12. Consider a family of diffusion processes X,(t) on (0, 00) with coefficients 
u(x) = ax + a, a> 0, a> 0; 
o?(x) = 67x, o > 0. 
(i) Classify the boundaries 0 and oo. 


(ii) Let X,(t) and X,(t) be independent processes with X,(0) = x, and X4(0) = x,. 
Establish that for every t 


Pr{X (t) + X(t) < x} = Pr{Xai g(t) < x| Xa+ g0) = Xa + Xp}. 
Hint: Refer to Problems 9-11. 


13. Consider the diffusion process {X(t)} on [0, 00) having infinitesimal coefficients 
o7(x) = 2x, u(x) = 0. (The endpoint 0 is an exit boundary.) Establish the Laplace 
transform 


—À 
E[e"?2*| X(0) = x] = apf, A a 
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by showing that the right-hand side (4; x, t) satisfies the differential equation 


do ĉo 


— = x, x>0, t>0 
ôt ôx? 


and the initial condition g(x, 0) = e7 *. 

14. Let {B(t)} be a standard Brownian motion and suppose A(x) is a positive function 
for which {@[A(B(t))]7' dt =œ with probability 1. For each t define the random 
variable C(t) by 


C(t) 1 
f dt=t 
o A(B(t)) 
and form the process Z(t) = B(C(t)). Show that {Z(t)} is a diffusion with infinitesimal 
parameters 1z(z) = 0 and o}(z) = A(z). 


Hint: Observe that 


C(t +h) 1 
e =? 
so that 
C(t + h) — C(t) = A(B(C))h + olh). 
Then evaluate 
E[{Z(t + h) — Z)}"|Z@ = z] = EL{B(CE + A)) — BCC" IBCC) = z] 
=0 for n=1 
= A(z)h + o(h) for n=2 
= o(h) for n=4. 


15. A regular diffusion process {X(t)} on [0, 00) has continuous infinitesimal param- 
clers u(x) and o?(x) > 0 for x > 0, and 0 is a regular boundary. Suppose that reflecting 
barrier motion is prescribed at the boundary 0 and consider the problem of evaluating 


w(x) = e| | ax ds|X(0) = J for O<x<a 


0 
for a given function g(x) where T = inf {t > 0, X(t) = a} is the hitting time to a. Suppose 
that |u(0)| < co and 0 < o7(0) < œ. Show that w(x) is the solution to the differential 
equation 
u(x)w'(x) + 40°(x)w"(x) = —g(x) for 0<x<a 
with boundary conditions w(a) = 0, w'(0) = 0. 


Hint: Let s(x) = exp{ —J§ [2m(¢)/o7(€)] dë} and m(x) = 1/[o7(x)s(x)] for x = 0 be 
the scale and speed densities for the process, and let { Y(1)} be the diffusion process on 
(- 90, +00) having scale density sy(y) = s(|y|) and speed density my(y) = m(| yl). 
Then (¥(t)} has infinitesimal parameters p(y) = (sgn ye({y|) and 62(y) = ¢2([yI). 
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Use the fact that the reflecting barrier phenomena i is equivalent to setting X(t) = | Y(d)|. 
Let U = inf{t > 0,| Y(t)| = a} and 


wry) = E p g(| ¥(s)|) ds| Y(0) = 7 


Then 4(d/dM)[(dwy/dS)](vy) = —g(\y|) for —a < y <a. Use the symmetry wy(y) = 
w(|y|) and the smoothness of wy(y), sy(y), and my(y) at zero to deduce wy(0) = w'(0) = 0. 


16. Let {X(t)} be a Brownian motion on J = [0, œ) with drift u and variance param- 
eter ø? = 1 where 0 is a reflecting boundary. Let T, be the hitting time to level a > 0 
and set v(x) = E[T,|X(0) = x] for 0 < x < a. Obtain v(x) by solving the differential 
equation 


$0"(x) + po'(x) = —1, 0<x <a, 
subject to v'(0) = 0, v(a) = 0. 


Answer: When u # 0, then 
1 1-2 -2 
u(x) = = | (a — x) — => (e7? — e7],  O<x <a; 
u 2u 


when p = 0, then 
v(x) = (a? — x?). 


17. Let {X(t)} be a diffusion process on [0, 00) with infinitesimal parameters u(x) = 
—a < 0 and o?(x) = 1 for x > 0. Suppose 0 is a reflecting barrier. Show that (é) = 
2ae 5, č > 0, is a stationary density. 


18. Let {X(t)} be a Brownian motion with drift u and variance parameter o°, but 
modified as follows: (i) 0 is a reflecting barrier; (ii) whenever the process reaches a > 0, 
it is instantaneously restarted at 0. Determine the stationary density ọ(¢&). 


Answer: When u # 0, then p(€) = K~!{1 — exp[—2u(a — &)/o7]}, 0 < č < a, where 
K =a — (o7/2p)[1 — exp(—2pa/o”)]. When u = 0, then 


mE 2(1 -‘), (28 Sy 
a 


19. Define X(t) = | B(t)|> where {B(t)} is standard Brownian motion on [0, ©). 
(i) Show that X(t) is a diffusion process for which 0 is a reflecting boundary. 
(ii) Show that v(x) = E[T,| X(0) = x], where T, = inf{t > 0, X(t) = 4}, is ex- 
plicitly v(x) = 17 — x?? for0 < x < å. 
(iii) Verify that v'(0) = — œ but dv/dS|,,.9 = 0, where S(x) is the scale function of 
the process. 


20. Consider a regular diffusion process on the interval [0, 1] where 0 is an absorbing 
state attainable in finite expected time and 1 is a reflecting barrier. Consider the differen- 
tial operator 


Lo(x) = wo Ee 30) + o) => E), 0O<x<1, 
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corresponding to the diffusion coefficients u(x) and o?(x). Show that the associated 
Green function (analog of (3.15)) is calculated by the formula 


uo(x)u, (č) 
EA WOE for 0<x<č<l, 
—“ Guate) for 0<č<x<1 
WEE l 


where uo(x), up to a scalar multiple, is a positive solution of Luọ(x) = 0,0 < x < land 
uo(0) = 0, and u,(x) solves Lu,(x) = 0,0 < x < 1 and u^ (1) = 0, and W (E) = uo(é)u, (é) 
— uo(¢)u (6). 

21. Let {X(t)} and {Y(t)} be independent regular diffusion processes on the same 
interval J and having the same transition function P(t, x, E) for xin 1, Ec I, and t > 0. 
Suppose that X(0) = x, Y(0) = y with x < y, and let A and B be sets in J with A < B 
in the sense thata < b for all a in A and b in B. 

(i) For a fixed time t, use a reflection argument to show that 


Pr{X(t)e A, Y(t)e Band X(s) = Y(s) for some s < t|X(0) = x, YO) = y} 
= Pr{X( e Band Y(t)e A| X(0) = x, Y(0) = y}. 
(ii) Show that the determinant 


D = det 


P(t, x, A) P(t, x, B) 
P(t, y, A) P(t, y, B) 


evaluates Pr{ X(t) € A, Y(t) € Band X(s) # Y(s) for all s < t}. 

(iii) Conclude that D > 0 whenever P(t, x, E) is the transition function of a regular 
diffusion process and x < y, A < B. 

(iv) Let p(t, x, y) be the density function corresponding to P(t, x, E). Assume 
p(t, x, y) is continuous in y for each x. Show that if x, < x, and y, < y2, then 


p(t, Xis yı) p(t, Xis V2) 


> 0. 
p(t, X25 Vi) p(t, X25 V2) 


det | 


22. Let X*(t) be the conditioned diffusion process discussed at the start of Section 9. 
Show that the boundary point 1 remains an exit boundary for X*(t). 


23. A Wright-Fisher diffusion X(t) with no mutation or selection (cf. Section 4.A(a)) 
has infinitesimal parameters p(x) = 0 and o?(x) = x(1 — x) for 0 < x < 1. Consider 
the related diffusion process X*(t), which is X(t) conditioned on absorption at 1. (Consult 
the beginning of Section 9.) Let T* be the absorption time in the conditioned process. 
Compute g(x) = E[(T*)?|X*(0) = x]. 


Answer: 
4 x dë l dë 
w= -ta -v foa -0+ f'a- aa - 9 FI 
x 0 S x č 
24. Consider a regular diffusion process X(t) on [0, 1] where both boundaries are exit. 
Let X#() be the process conditioned on absorption at i for i = 0, 1. Prove that the 
expected occupation time of an interval J = (a, 2) by X*(1) starting from X#(0) < a 
is the same as that of X4(r) starting from X*(O) > p. 
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Hint: Let G(x, y) and G¥(x, y) be the Green functions corresponding to the two 
conditioned processes. Show that G(x, y) = G¥(y, x) by direct calculation from the 
infinitesimal parameters of the conditioned processes. 


25. Let {X(t),0 <t <1} be a standard Brownian motion but conditioned to be 
normally distributed with mean zero and variance o? at time t = 1. Show that {X(z)} 
is a time inhomogeneous diffusion with infinitesimal parameters 
x(o? — 1) 
1 + t(o? — 1) 
26. Let {X(t), t > 0} be a standard Brownian motion and define Y(t) = [X]. 
(a) Argue that {Y(t)} is a regular diffusion process on (— 00, + 00). 


L(x, t) = o*(x) = 1. 


Hint: g(x) = x? is strictly monotonic and continuous. 

(b) Derive the infinitesimal parameters py(y) and o}(y). Is there a contradiction 
between the facts (i) py(0) = o3(0) = 0, and (ii) the {Y(t)} process is regular on 
(— œ, +0)? 


1/3. 


Answer: uy(y) = 3y"? ; o%(y) = 9y 


(c) Consider the differential operator L = py(y) d/dy + 402(y) d*/dy? operating on 
twice continuously differentiable functions f. Show that Lf(0) = 0 for all such f. (Com- 
pare with Theorem 12.1.) 

(d) Evaluate the scale and speed functions. 


Answer: Sy(y) = y"?, My(y) = y*?. 


(e) Let T = T, = min{t > 0:|Y(@| > 1} = min{t > 0:|X(t)| > 1}. Deter- 
mine v(y) = E[T|Y(0) = y]. 


Answer: voy) =1 -— y3,-l<y<t. 


4/3 


(f) Then v is not twice continuously differentiable and hence, not in the domain 
of the differential operator L. Show that v is in the domain of the differential operator 
A = 4(d/dM)(d/dS) and that Av(y) = —1 for -l<y<1. 


27. Let o(é), — œ < č < œ, be a bounded twice continuously differentiable function 
with g’(é) > 0 as |é| + œ. Let {B(t)} be standard Brownian motion. Show that the 
process 


t 


Z(t) = P(B(t)) — 3 | op" (B(s)) ds 


0 
is a martingale. 


Hint: Use Dynkin’s formula (11.37). 


28. Consider any solution g(x) to $@"(x) + q(x)e(x) = 0,— œ < x < 00, where q(x) 
is bounded and continuous. Show that 


M(t) = 9(BQ) e] | 


(0) 


t 
q(B(s)) a] 
is a martingale with respect to the standard Brownian motion {B(1)}. 


Hint: Using the Ito stochastic calculus formula de(B() = p(B) dBW) + 4e"(B() dt, 
show that dM(1) = ¢(B(1) expEfh g(B(s)) ds] dB(Q), whence M(t) is an Ito integral. 
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29. Let {R(t), t > 0} denote the distance of three-dimensional Brownian motion from 
the origin (i.e., the Bessel process for d = 3). Show that 
sinh(«R(t)) 


Zt) = aR} exp(— 307) 


is a martingale for each fixed « > 0. 


Hint: The function g(r, t) = [ær] ' sinh[ar exp(—4a7t) satisfies the time reversed 
backward differential equation 

6p _ 109 1d 

ot = 2 er? or Gr 


30. The following diffusion on J = (0, 1) arises in modeling gene frequency fluctuations: 
u(x) = x(1 — x)[syx — (1 — x) — vx? + va(1 — x}? + rx(1 — x)(2x — 1], 
o7(x) = x2(1 — x)?[v,x? + v2(1 — x)? — 2rx(1 — x)], 


where v, > 0, v, > 0, and r? < v,03. Verify the following: 

G) If s,/v, > 4 and S3/02 < 4, then 0 is an entrance boundary while 1 is an exit 
boundary; 

Gi) Ifs,/v, <4 and s,/v, > 4, then 0 is an exit boundary while 1 is entrance; 

Gii) Ifs,/v; > 4and s3/v, > 4, then both boundaries are entrance; 

(iv) Ifs,/v, <4and s,/v, < 4, then both boundaries are exit. 


31. Let {B(t)} be a standard Brownian motion and f(x), — œ < x < œ, an integrable 
function with ft% f(x) dx # 0. For each 4 > 0 consider the process 


1 At 
Y(t) = —= (B(t)) dt. 
y(t) J i I(B(a)) dt 


Show that the first four moments of Y,(t) converge as à > œ for each fixed t. [It can be shown 
that Y,(t) converges in distribution as A > œ to c(f)lio\(t), where lo(t) is the local time 
process at zero of a standard Brownian motion and c(f) is a constant. ] 


32. Suppose the market price S(t) of a stock fluctuates according to a diffusion process 
having infinitesimal parameters p(s) = us and o7(s) = o?s” for 0 < s < œ. Suppose 
that the value of an option on the stock is a function W = F(S(t), T — t) of S(t) and 
the remaining time T — t (T is fixed) to exercise the option. In Section 15.D it is 
established that F satisfies the differential equation —F, + rsF, + 40°s°F,, = fFF, 
subject to F(s, 0) = (s — a)* = max{0, s — a} where a is the fixed cost to exercise the 
option and r is the secure interest rate in the economy. 
(i) Validate the representation 


F(x, t) = Ele~"(X(t) — a)* |X(0) = x] 


where X(t) is geometric Brownian motion with parameters py(x) = rx and o%(x) = o7x?. 
(ii) Calculate the explicit formula 


Ose iat “r aus no (Belo pare a =) 
a Jt a ft í 


where D(z) = (2m) '? f* n exp(~4e4) dé. 


F(x, t) = v0 
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Hint: (i) Show that F(x, t) as given solves the requisite differential equation and 
boundary condition. i 

Gi) In X(t) is normally distributed with mean In x + (r — 407)t and variance 
o7t. 


33. Consider a diffusion {X(t), Y(t)} in two dimensions on the triangle 0 < x, y and 
x + y < 1, and having the infinitesimal parameters 


Mx(x, y) =a — (a+ 7)x, My y) = B- (B + vy; 
ok x(x, y) = x(L— x), oF vO, y)= WA y) ož% y) = —2xy. 
(This is the analog of the Wright—Fisher diffusion model (Section 2.F(a)) under reversible 
mutation pressures involving three allele types and with no selection.) Establish that the 
Dirichlet density 
_T@+Bb+y), 
Por) 


is a stationary density by showing it satisfies the forward Kolmogorov equation 


f(x, y) gi ai ee? ees 


1 2 2 2 


14 
zza EOT IG -a T [(xy) f(x, y)] + 2 ay? BA — y) S(x, y)] 


ô 
— 3, tle — @ + B+ xS y) 
ô 
= $ — (a + B + yl, y)} = 0. 


34. Let Y(t) be a zero mean process whose covariance function R(s, t) = ELY(s)Y(t)] 
has a continuous derivative p(s, t) = 6?R(s, t)/ðs ôt. For « > 0 define the approximate 
derivative Y,(t) = [Y(t + «) — Y(t)]/a. 

(i) Show ELY,(t)¥,(t)] > p(t, t) as a + 0, B > 0. 


Hint: Use (14.9). 


(ii) Show that || Y(t) — Y,(z)||? > 0 as « > 0, $ > 0 where ||Z|? = E[Z?] is mean 
square distance. Hence {Y,(t)} satisfies the Cauchy criterion for convergence in mean 
square and there exists a random variable Y(t) for which Y,(t) > m. Y(t) as a > 0. 


35. For any regular diffusion process {X(t)} on (J, r) having continuous infinitesimal 
coefficients show that there exists an « > 0 for which 

E[exp{a(T, A T,)}|X(0) = b] < œ, l<a<b<c<r. 
Hint: Refer to Eq. (11.50). 


36. Consider a population size N(t) whose fluctuations are governed by the differential 
equation dN(t) = N(t)dW(t), where {W(t)} is Brownian motion with drift u and 
variance o°. Show that the solution in the Stratonovich sense has transition density 


p(t, X, y) = Lans exp| - mera | F 


z- 
y/2na71 20° 
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and moments 
E[LN(t)|N(0) = no] = no exp[(u + 307)¢] 
var[N(t)|N(0) = mo] = nô expl(2u + o°)eIe*" — 1]. 


37. Solve (in the Stratonovich sense) the following equation describing random 
growth under a restraining force: 


dN(t) = NOLL — N(t)/K] dB), 0 < NO) < K, 
where {B(t)} is a Brownian motion with drift u and variance o”. 


Hint: Show that 
N(t) dé 


o &(1 — ¢/K) 


is a Brownian motion with drift u and variance o°. 


Y(t) = 


38. Given the Stratonovich stochastic differential equation 


s 
X(t) 


dX = -paxo | ` OI a} dt + g(X(t)) dB, 


where g(č) > 0 for all real č, 6 > 0 and {B(t)} is standard Brownian motion, we define 


X(t) 
Y(t) = [gë] dé. 
O 
(i) Show that dY = —fY dt + dB, i.e., Y is an Ornstein-Uhlenbeck process. 
(ii) Find the transition density function for {X(t}. 


39. Let {X(t)} be a diffusion process with infinitesimal operator A and suppose that 
the function f(x) = 1 is in ZA), the domain of A, and that Af (x) = 0. Show that if u is 
in “(A), and u(x9) = max u(x) > u(x) for all x # xo, then Au(x9) < 0. 


Hint: Use the Dynkin form (11.46) of the infinitesimal operator. 


40. Fora fixed « < 0, let 2 be the set of all twice continuously differentiable functions 
f defined on [0, œ), and where the limits of f(x), f'(x), and f"(x) as x > œ all exist, 
with f"(co) = 0, and af (0) + f”(0) = 0. Define the operator A on 2 by Af (x) = 3f'"(x). 
Show that A defined on 2 generates a positive semigroup on C[0, 00). 


Solution: Check that for each g in C[0, œ) the equation —3f" + Af = g admits a 
unique solution fin Y as prescribed, and that if g > 0, then f > 0. 


4l. Consider u(x) and o?(x) continuously differentiable on [0,1] with o7(x) > 0, 
0 <x <1. Let % be the space of all twice continuously differentiable functions on 
[0,1] with (0) = f'(1) = 0. Define the operator A on & by Af(x) = $07(x)f"(x) + 
w(x) f(x). Show that A determines the infinitesimal operator of a diffusion process. 


Hint; Check the conditions of the Hille-Yosida theorem. 
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42. Let 2 be the set of all twice continuously differentiable functions f defined on the 
circle of circumference 1, (i.e., f is twice continuously differentiable and is periodic of 
period 1). Define the operator A on 2 by Af (x) = 4f'"(x). 

(i) Check the hypotheses of the Hille-Yosida theorem affirming that A qualifies 
as an infinitesimal operator. 

(ii) Establish that the transition density of the induced process possesses the 
spectral representation 


2972 


2: ie} 
pt, xy =+- exp(- COS NZX COS nTy. 
T n=1 
(iii) What is the stationary distribution of this process, commonly called Brownian 
motion on the circle? 


43. Consider a semigroup U, for a Markov process. Let A be the infinitesimal generator 
with domain 2. If v is in 2, show that 


sup|AR, v(x) — v(x)| = 0 as A>. 


44. Consider Brownian motion restricted to the closed interval J = [0, 1] with 0 and 1 
acting as reflecting barriers. Show that Af (x) = 4f"(x) with f'(0) = f’(1) = 0 describes 
the infinitesimal operator of this diffusion. 


45. For reflecting Brownian motion show that the infinitesimal generator A is charac- 
terized by the domain 2 of all functions f(x) on [0, œ) with f(x), f'(x), and f"(x) con- 
tinuous on the interval [0, 00] with f”(%0) = 0 and f’(0) = 0. 


Hint: Using the explicit form of the corresponding semigroup U,, show that [U, f(0) 
— f(0)]/h behaves like 


roe ee K: 
ro? hr as h—0. 


46. Consider Brownian motion on [0, 00) such that when 0 is first attained, the process 
is killed. Show that f(0) = 0 for every function f in the domain of the infinitesimal 
operator. 


47. Consider absorbing Brownian motion on [0, 00) where 0 is a trap state. Show that 
the domain 2 of the infinitesimal operator A has f, f’, and f” continuous on [0, 00] and 


f"(O) = 0. 


48. Let 2 be the space of functions f(x) defined on [0, 00) for which f(x), f'(x), and 
f'(x) are continuous and bounded, lim,.,,, f(x) = 0 and f’(0) = 0. For fin 2, define 
the operator A by Af(x) = $f"(x). Show that A satisfies the conditions of the Hille- 
Yosida theorem on the space C[0, oo]. 


Hint: Check that (i) 2 is dense in C[0,00] and (ii) for each g in C[0, œ] and A > 0 
there exists a unique fin 2 for which 


A(x) = AS") = gs). 
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49. Show that the spectral representation of the transition density for absorbing 
Brownian motion is proportional to the Fourier sine transform 


| e77" sin xA sin yA da. 
0 


50. Determine the spectral representation for standard Brownian motion in N di- 
mensions. 


Answer: The inverse Fourier transform of the multinormal characteristic function. 
51. Determine the spectral representation for Brownian motion with drift. 


52. Define Z(t) = {{** B(s)ds — B(t) where {B(t)} is standard Brownian motion. 
Show that Z(t) has stationary increments. 


53. Let {X(t), t > 0} be a Bessel process of order « on I = [0, 00). (Consult Example 6 


on page 236.) Then Au = 4u” + [4(a — 1)/x]u’. Assume that « > 2, or equivalently, that 


0 is an entrance boundary. (See Example 3 of Section 6.) Then 
HA) = {u: u € C[0, 00), u(x) > 0 and Au(x) > 0 as x > 00; x*~'u'(x) > 0 as x > 0}. 


Let ox be the first passage time out of [0, K]. 

(i) Show that E,[o,] = (K? — x?)/aforO0 <x < K. 

(ii) Show that E,[o2] = 4K?(K? — x?)/a?(a + 2) + (K? — x?)?/a(a + 2) for 0 < 
xX<K. 


Hint: (i) Apply the Dynkin formula as in Section 11. As in Eq. (11.41), take 
K? — x? for 0<x<K, 
u(x) = 


very smooth for x> K. 


Show that u is in 2(4) and follow the method of (11.40) to (11.43). 
(ii) Take 
(K? — x?)? for 0<x<K, 
v(x) = 
very smooth for x > K, 


and follow the approach of (12.24) to (12.25). 


54. Consider a regular diffusion on (J, r). Give reasons that if both ends are entrance 
boundaries, then the process converges to a unique limit stationary distribution. Provide 
examples showing that, when both boundaries are natural, a limiting distribution may 
or may not exist. 


55. Consider a diffusion process {X(s), s > 0} on the state space I = [0, œ) with 
(x) = 0 and o?(x) = x. Define the Kac functional E,[exp{—A fo X(s) ds}] = v(t, x). 
Show by direct verification that w(t, x) = exp( —x,/2A tanh t SA2) is a solution to 

x ôv 
a aja AXD and o(f, 0) = I. 


Conclude that w(t, x) = (h x). 
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56. Let B(t) be standard Brownian motion. Show that the Kac functional 


v(t, x, A) = E,|exof -2 xo a}, A>0 
o 


exp{—/A(x?/2) tanh 2t,/2} 
./cosh 2t JA 


57. Establish that a probability density g(x) for a diffusion X(t) of semigroup U, with 
infinitesimal generator A is a stationary density if and only if 


is 


v(t, x, A) = 


[aow dx =0 for all ue GA). 


58. Consider a sequence of M/M/1 queueing systems indexed by n = 1, 2,.... Thus, 
for the nth system customers arrive according to a Poisson process of parameter 1, 
and served in order of their arrival. The service times are exponentially distributed 
having mean p,. Define p, = 4,/y,. Let Q,(t) be the number of customers waiting for 
and receiving service at time epoch t in the nth queueing system. Form the rescaled 
process 


Q,(nt) 
Jn 


Let X(t), t> 0 be a Brownian motion with drift process of mean coefficient ct and 
variance coefficient o°, that is, X(t) = oB(t) + ct with B(t) standard Brownian motion. 

Let R(t) = X(t) — inf {X(s);0 < s < t} which is X(t) reflected at the origin. Let 
R(t) denote the restriction of R(t) to [0, T]. Suppose as n > œ, that 1, > A, H, > H4, 
À< u, [An — Hn] Jn —>c <0. Show that the process Y,(t) for each t, 0 < t < T, converges 
in the sense of distributions to R(t) based on X(t) with parameters c and o? = À + p. 


Y(t) = for O<t<T, T fixed . 


59. Let X(t) be Brownian motion with drift parameter c > 0 and variance coefficient, 1. 
We will refer to X(t) as an income process with X(0) = 0. Define the corresponding 
assets process (for an initial endowment x) by 


t 

Y(t) = efx + fer dX(s), t>0 
(0) 

where the money earns at an interest rate 8. By integration by parts, observe that 


t 


Y(t) = ex + X() + B | ef- X(s) ds. 
0 


Show that, in terms of standard Brownian motion B(t), we have 
1 c 
Y(t) = —— Bie?" — 1) + + (e® — 1). 
/ 2B B 


Find the infinitesimal mean and variance of Y(t) (cf. (5.23) and (14.19)). 
Let {T = inf t > 0: Y(t) < 0} which is called the ruin time. Find 


r(y) = Pr{T < «| Y(0) = y} for y>0. 
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60. Find the spectral representation analogous to (13.14) on [0, oo) for 


õu 1 1 Of y_, Gu Ou 
a 5 pN=1 3p r or ae a r> 0, t>0 


(y a positive parameter). What is the nature of the boundaries at 0 and œ? 


NOTES 


Ito and McKean’s advanced treatise [1] is virtually the definitive source on the 
theory of one-dimensional diffusion processes. 


Some of the most interesting ideas in diffusion processes find their genesis in Lévy’s 
inspiring book [2]. 

A wide ranging presentation of Markov processes is the two volume work by 
Dynkin [3], where diffusion processes occupy a prominent role. 

A compact monograph highlighting the essential concepts and techniques of 
stochastic integrals and the stochastic calculus in the Ito sense is McKean [4]. 

Important classes of stochastic processes converging to diffusion are discussed in 
the basic book of Billingsley [5]. 

The multivolume works of Gikman and Skorokhod [6, 7] present extensive treat- 
ments of the theory of stochastic differential equations covering one and higher 
dimensions. 

Diffusion process formulations and methodology play an important role in modeling 
dynamical systems for physical and engineering phenomena. In this vein, Jazwinski [8] 
and Wong [10] highlight a wide spectrum of applications. 

Friedman [13] elaborates the theory of stochastic differential equations, mainly in 
the context of stochastic control problems, and in their relation to the solutions of 
parabolic partial differential equations. The motivation of stochastic contro! and 
statistical decision processes also underlies the developments in Lipster and Shiryaev 
[15]. A brief account is contained in the monograph of Kushner [9]. 

Mand! [11] handles many facets of one-dimensional diffusions in a direct analytical 
framework. 

A far ranging advanced mathematical exposition by Ikeda and Watanabe is in 
press which integrates the subjects of stochastic integration with respect to generalized 
martingales instead of Brownian motion, the theory of stochastic differential equations 
with special attention to diffusion processes on manifolds, featuring theorems on com- 
parisons and approximations of different processes. 

Arnold [14] has provided a succinct presentation of stochastic differential equations 
and applications. For another approach, see McShane [12]. 

Four chapters of Ewens [16] exhibit the pervasive and important role of diffusion 
stochastic processes in treating a number of important population genetics models. 
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Chapter 16 


COMPOUNDING STOCHASTIC 
PROCESSES 


In this chapter we treat a series of isolated stochastic models, partly motivated 
by applications to astronomy, biology, engineering, and physics. These processes 
are formed by compounding various classical processes, including Poisson, 
branching, and growth processes of the diffusion type. In each case, a secondary 
process feeds into a primary process whose state variable is the object of study. 
In Section 1 we shall characterize multidimensional Poisson processes and in 
the following section give an application to astronomy. The concept of multi- 
dimensional Poisson processes will play an important part in the formulation of 
cascade or compound stochastic processes. Some of these will be studied in 
the later sections of this chapter (e.g., see Section 2). 

In Section 3 we examine a stochastic model involving growth and immigra- 
tion. In Section 4 we formulate a stochastic process of growth involving two 
types, a normal type and a mutant type. The population of wild (i.e., normal) 
types grows deterministically whereas the mutant type grows in accordance with 
the laws of a Markov branching process. Moreover, each normal type at its 
death (lifetime is exponentially distributed) changes into a mutant type and 
then reproduces like any other mutant individual. 

Sections 5 and 6 treat stochastic models of growth which take account of 
the factor of geographical distribution and spread of the population as well as 
their natural growth behavior. 

The stochastic processes investigated are typical of a large class of general 
cascade processes. The purpose of this chapter is to introduce the student to 
the richness of applications and subtleties of analysis of problems involving 
combinations of stochastic processes, 
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Sections 7 and 8 are devoted to a review of some deterministic models of 
population growth, taking account of thé age structure of the population. The 
important compound Poisson process is introduced with applications in 
Section 9. 


1: Multidimensional Homogeneous Poisson Processes 


Poisson processes were introduced in Chapter 4 where the index parameter is 
identified as a real positive number t > 0, usually referred to as time. We now 
formulate a version of the Poisson process where the parameter value is deter- 
mined by the measure of a set in the plane or in space or even in more general 
spaces. The objective of this section is to define some versions of multidimen- 
sional Poisson processes and to describe some examples of these processes and 
their applications. 

In Chapter 4 the Poisson process {X(t), t > 0}, was characterized axio- 
matically. It was proved that X(t) has the probability distribution 


-a sy" 
k! 


Pr{X(t) =k} =e for t>0, k=0,1,2,..., 


where 4 is a positive constant interpreted as the average rate at which events are 
happening per unit time. In this section we shall introduce a set of postulates 
that characterize a generalized homogeneous spatial Poisson process X(S), 
where the parameter S denotes a bounded region of the plane or space and X(S) 
has the probability distribution 


Pr{X(S) = k} = eta LAST 


for k=0,1,2,.... (1.1) 
Here, 4 is a positive constant called the intensity parameter of the process and 
A(S) represents the area or volume of S, depending on whether S is a region in 
the plane or space. The constant / is called the parameter of the multidimensional 
Poisson process. The required postulates are the following: 


(i) For X(S) only nonnegative integer values are assumed and 0 < 
Pr{X(S) = 0} < 1 if A(S) > 0. 

(ii) The probability distribution of X(S) depends on S only through the 
value of A(S) with the further property that if A(S) > 0 then Pr{X(S) > 1} > 0. 

(iii) IfS,,S,,...,S,,(n => 1) are disjoint regions, then X(S,),..., X(S,,) are 
mutually independent random variables and 


X(S; U U Sa) = X(S,) +--+» + X(S,). 
Pr{X(S) > 1} _ 


Gv) am, Prix(S) = 1} 
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Prior to offering further descriptive discussion of these axioms it is worth- 
while to highlight some suggestive examples. 


(a) In three dimensions X(S) could represent the number of stars located 
in the region S. 

(b) Onatwo-dimensional plate X(S) may represent the number of bacteria 
of a certain species contained in the region S. 


The motivation and interpretation of the above axiom system is quite 
evident. Postulate (ii) asserts that X(S) does not depend on the shape of S but 
only on its area or volume. This appears to be reasonable with reference to 
examples (a) and (b). According to Postulate (iii), to use the terminology of 
Example (a), the number of stars contained in disjoint regions are independent 
random variables and the value of X(S) for the combined region is the sum of 
X(-) for the component regions. The independence assumption seems to be a 
reasonable approximation to the situation of the actual distribution of stars. 
Postulate (iv) is rather intuitive and self-explanatory. 

Our main objectiye in this section is to prove the following theorem: 


Theorem 1.1. If a random process X(S) defined with respect to regions S of 
Euclidean n space satisfies Postulates (i)—(iv), then X(S) has the distribution given 
in (1.1). 


Proof. Consider an arbitrary finite region S of positive area (volume) A(S) > 0. 
Divide S into disjoint regions S,, S,,..., S, of equal area (volume), i.e., 
S,US,U---US, = S, 


S; O Sj = Ø, i Fj (Ø = empty set), 
1 
A(S;) = z A(S) forall i=1,2,...,n. 


Then by Postulate (iii) 
Pr{X(S) = 0} = Pr{X(S, U---US,) = 0} 
= Pr{X(S,) + --- + X(S,) = 0}. 


But Postulate (i) tells us that )'"_, X(S,) = 0 can occur if and only if X(S;) = 0 
for all i = 1, 2,...,n. Then, using the independence of X(S;), i = 1, 2,...,n, as 
postulated in (iii), we have 


Pr{X(S) = 0} = Pr{X(S) = 0,i = 1,2,...,n} = []Prix(s, = 0}. 
i= 


According to Postulate (ii) Pr{X(S,) = 0} depends only on A(S) = (1/n)A(S); 
hence 


Pr{X(S,) = 0} = Pr{X (S3) = 0} =- = Pr{X(S,) = 0}. 
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Thus, we have i 
Pr{X(S) = 0} = [Pr{X(S,) = 0}]”. (1.2) 
Further 
Pr{X(S,) = 0} = 1 — Pr{X(S,) 2 1}. 


Taking logarithms on both sides of (1.2), which is permissible because of 
Postulate (i), yields 


—log Pr{X(S) = 0} = —n log[1 — Pr{X(S,) = 1}] 
= n[Pr{X(S,) > 1} + 4(Pr{X(S,) => 1)? + ---], (1.3) 
where we have used the expansion 
—log(l — x) = x + $x? 4 3x8 405, 


valid when 0 < x < 1. It is clear that 0 < Pr{X(S,) = 1} < 1, since otherwise 
we would have Pr{X(S,,) = 0} = 0, which implies Pr{X(S) = 0} = 0. This, 
however, is impossible by Postulate (i), as we assume A(S) > 0. Now, formula 
(1.3) may be rewritten in the form 


— log Pr{X(S) = 0} = n Pr{X(S,) = 1} 


Pr{X(S,) = 1} 
Winey oa (1 + O(Pr{X(S,) > D|: (14) 
The interpretation of O(Pr{X(S,) = 1}) is as usual: The quantity 


O(Pr{X(S,) = 1}) 
Pr{X(S,) 2 1} 


is bounded as n > œ. Note by Postulate (ii) that 


1 
Pr{X(S,) = 1}>0 since A(S,) = A A(S)>0, n> o. 


Further, it follows by Postulate (iv) that 


_ Pr{X(S,)>1}_ 4. Pr{X(S,) > 1} | 
MD XG) aA 


Hence, letting n > œ on both sides of (1.4) yields 
—log Pr{X(S) = 0} = limn Pr{X(S,) = 1}. (1.5) 


n> 0 


Because of Postulate (i) the left side must necessarily be positive and finite. 
Consider the generating function of X(S) and X(S,): 


g(s) = E[s*®] = 5 Pr{X(S) = k}s* 
k=0 
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and 
g5) = ELS] = È Pr{X(S,) = Kost 
Then, by Postulates (ii) and (iii) 
g(s) = E[s*°] = E[s*O0 4" tO) = [[Ets*™) = (E[s*°”})"; 
i=1 


that is, 
g(s) = [g,(s)]”. (1.6) 

We may write g,(s) in the form 

Ials) = Pr{X(S,) = 0} + Pr{X(S,) = Us + Pr{X(S,) > 136(s), 
where |0(s)| < 1. But 

Pr{X(S,) = 0} = 1 — Pr{X(S,) = 1} — Pr{X(S,) > 1}. 

Hence, substituting for Pr{X(S,) = 0} we get 

Gn(S) = 1 + (s — 1) Pr{ X(S,) = 1} + (Os) — 1) Pr{X(S,) > 1}. 7) 
We will now use Postulate (iv) which asserts that 


Pr{X(S,) > 1} 
Pr{x(S, = 1} 


> 0, n> o, (1.8) 


and we also need the fact that Pr{X(S,) = 1} > 0 as n > oo. Indeed, we est- 
ablished above (see (1.6)) that 
g(s) = (E[s*°r]y" 


or, what is the same, 
[g(s)]”" = X, Pr{X(S,) =kjs O<s<1. 
k=0 


By hypothesis (i) we know that g(0) = Pr{X(S) = 0} > 0. Therefore, 


AO 2200) 


0 tin 
2 206) a im g(oy 4 


Pr{X(S,) = 0) = Žo" ZO 


which clearly tends to zero as n > œ. 
Now, referring to (1.6)-(1.8) and expanding the logarithm in the form’ 


ga ys 
logi + z)=2— 5 +p = 2 + ol), |z| > 0, 
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we obtain the formula 
log g(s) = n log g,(s) 
= n[(s — 1) Pr{X(S,) = 1} + ((s) — 1) Pr{X(S,) > 1} 
+ o(Pr{X(S,) = 1})] 


Taking limits on both sides as n > œ, again using Postulate (iv) (specifically 
relation (1.8)), yields 


log g(s) = (s — 1) lim nPr{X(S,) = 1}. (1.9) 


We infer from (1.5) and (1.9) that 
log g(s) = —(s — 1) log Pr{X(S) = 0} 
or 
g(s) = exp[(s — 1)(—log Pr{X(S) = 0})]. (1.10) 


This expression is the probability generating function of a Poisson law whose 
expectation is 


E[X(S)] = —log Pr{X(S) = 0}. 


But we also know that the expectation is a nonnegative additive function that 
depends only on A(S), and this implies 


—log Pr{X(S) = 0} = AA(S). (1.11) 
A formal proof of this last statement goes as follows. Let fdenote the function 
satisfying 
E[X(S)] = f(A(S)). 


This relation derives from Postulate (ii). We shall now prove that in fact it is 
linear. Let S, and S, be two bounded disjoint sets. Then 


E[X(S, O S2)] = f(AG,) + A(S2)), 
owing to the additivity of A(S). On the other hand, by Postulate (iii), we have 
E[X(S, U S2)] = ELX(S,)] + ELX(S2)] = f(AG,)) + fAGS,)). 


Since A(S) can vary from 0 to oo, this implies that 


fx+y=f)+fG) forall x,y > 0. (1.12) 


Moreover, by its very meaning f(x) > 0 and clearly f (0) = 0. The only solution 
of (1.12) with these properties is the linear function f(x) = Ax for some constant 
A (cf. page 125 of Chapter 4). This proves (1.11). 

In view of the remarks following formula (1.5), we conclude that 4 is a 
positive real parameter. Substituting in (1.10) we have 


g(s) == e? As- 1) 
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or equivalently 


= equi ee LAO ye 
g(s) =e * ee aS 


This confirms (1.1), that the probability distribution of X(S) is Poisson, 
and the proof of Theorem 1.1 is complete. W 


We next elaborate some further distribution properties of the stochastic 
process characterized by Postulates (i)-(iv). It is convenient to describe the 
event {X(S) = k} by “there are exactly k points in the region S.” 

We now show that if the process X(S) satisfies Postulates (i)—(iv), i.e., is a 
Poisson process in the plane or space, then under the condition that there is 
exactly one point in a region S of positive area, i.e., X(S) = 1, A(S) > 0, this point 
is uniformly distributed in S. Indeed, let S = S, US,, where S, and S, are dis- 
joint regions. Then by virtue of Postulate (iii) we have 
_ Pr{X(S,) = 1, X(S) = 1} 

Pr{X(S) = 1} 
_ Pr{X(S,) = 1, X(S2) = 0} 
Pr{X(S) = 1} 
_ Pr{X(S,) = 1} Pr{X(S2) = 0} 
Pr{X(S) = 1} f 
= exp[ —AA(S,)]AA(S,) exp[—AA(S2)] 
exp[ —AA(S)]AA(S) 


Since S, and S, are disjoint and their union is S, A(S) = A(S,) + A(S,) and the 
exponential factors cancel. Thus 


Pr{X(S,) = 11X(S) = 1} 


Pr{X(S,) = 1|X(S) = 1} = ae 


and this is equivalent to saying that the point in S is uniformly distributed. 
This result can be extended as follows: 


Theorem 1.2. If X(S) fulfills Postulate (i)-(iv) then under the condition X(S) 
= k, for A(S) > 0, these k points are independent and uniformly distributed in S. 


Remark. The assertion that k points in S are independent and uniformly 
distributed shall mean that, given any n disjoint regions S,,S,,...,5,, with 
(i=, S; = S, and any integers k,,ky,...,k,, X, ki = k, then 


Pr{k, points lie in S; kz points lie in S,;..., k, points lie in S,|X(S) = k} 


es alte HFA fas" 
S kalkal k ILAS) | | ACS) AS) 
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Proof. Let A 
S=S8 USU U Sa, 


where S,, S2, ..., S, are disjoint regions. Then for any nonnegative integers 
kis k2p... , kn with k; +k, +- +k, =k 


Pr{X(S,) = ky, X(S2) = k2, ..., X(S,) = knal X(S) = k} 
= Pr{X(S,) = kı, X (S2) = kz, aiei X(S,) = kn) 


Pr{X(S) = k} 
_ Pr{X(S;) = ki} Pr{X ($2) = k2} -+ Pr{X(S,) = k,} 
~ Pr{X(S) = k} 
points BASDI pais FAAS)? nas PAGNI 
k,! ka! k,! 


e 2A(S) TAA(S)I* 
k! 


= k! A(S,) || A(S2) ta : A(S,) | 
“klk ek! AS) | | AG) | LAGS) |’ 


because A(S,) + A(S) +- + A(S,) = A(S). M 


2: An Application of Multidimensional Poisson Processes to Astronomy 


Consider stars distributed in space in accordance with a three-dimensional 
Poisson process X(S) as described in Section 1. Let x and y designate general 
three-dimensional vectors. Assume that the light intensity exerted at x by a star 
located at y is f(x, y, æ). Here « is a real random parameter depending on the 
intensity of the star at y. We assume that the parameters « associated with dif- 
ferent stars are independent, identically distributed, random variables possessing 
a common density function k(-). We also assume that the combined intensity 
exerted at the point x due to light signals of different stars accumulates additively. 
Let Y(x, S) denote the total light intensity at the point x due to signals emanating 
from all the stars located in region S, i.e., 
Yx, S) = XS, Yr %). 

yreS 
Note that the summation contains a random number of terms which is actually 
finite with probability 1. We want to find the distribution of Y(x, S). We do this 
by determining the Laplace transform g(z; x, S) of Y(x, S), i.e., 


ioe) 


g(z; x, S) = Ef[e72” 5] = i e "h(é, x, S) dë, z > 0, 
0 


where h(-; x, S) denotes the density function of Y(x, S). 
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Of course, in principle, from knowledge of the Laplace transform we can 
compute the moments of Y(x, S) routinely, and generally there is an inversion 
formula which calculates h in terms of g. The actual inversion expression takes 
on a rather complicated form in the case at hand and so we do not bother with it 
here. 


Conditioning on the values of X(S), we have 
g(z; x, S) = E[e7??*9] = $ Efe" 7%5| X(S) = k] Pr{X(S) = k}. 
k=0 
But we know according to Theorem 1.2 that under the condition X(S) = k. i.e., 
k points in S, these k points are independently uniformly distributed in S. Hence 
Ele *¥9)|X(S) = K] = (Ele == X(S) = 1). 


To compute E[e~7"5)| X(S) = 1], note that if X(S) = 1, Y(x, S) = f(x, y, ), 
where y is the position of a single star in S and a is the corresponding random 
parameter which reflects its intensity. Now, since the position of this star is 
uniformly distributed in S, we have 


E[e~ 7% 5)| X(S) fas 1] we zis) [ i e SRY ME) da dy, 


where the integral with respect to y denotes a triple integral over the three- 
dimensional region S. Combining the relations above in the obvious manner, 
we obtain the Laplace transform of Y(x, S), 


oo 1 œ% k 
g(z; x, S) = 2, {ahs [ | | I ae wr Dk) anlay} 


[AA(S)}* 
k! 


= e774) epla Í [f e77 œr dka) da ix} 
S -0 
= epla f i e~r dka) da — i a}, 
S a: + 


because fs dy = A(S). We have determined g(z; x, S) in terms of f(x; y, «), k(«), 
and S, which can be regarded as known or calculable on the basis of other 
data. 


x exp[ —AA(S)] 


3: Immigration and Population Growth 


a 


The model described in this section is that of a population composed of a single 
type evolving from an initial population whose growth behavior follows the 
laws of a continuous time Markov branching process. Moreover, in addition to 
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the inherent growth of the population there is an influx of immigrants of the 
same type which contribute further descendants whose growth behavior is 
governed by the laws of the same branching process. The arrival pattern of 
immigrants into the colony is generally also a stochastic process. We formulate 
the process, for concreteness, as a model of growth of bacterial populations. 

Consider a bacterial colony of nọ individuals. Assume that each bacterium, 
independent of the others, produces descendants which in turn produce further 
offspring, etc. The population growth evolving from a single bacterium describes 
a continuous time Markov branching process. Let F(s, t) be the probability 
generating function of the population size at time t derived from a single bac- 
terium. Clearly, the population size at time t due to the existing colony of size 
No at time 0 is a random variable which has the generating function [F(s, t)]"°. 
Assume, furthermore, that immigration of new bacteria occurs at times t;, j = 
1, 2,..., N. Each immigrant will evolve a population following the same laws 
of reproduction as the original nọ bacteria, independent of them and each other. 
The population size at time t derived from an immigrant arriving at time t; has 
the probability generating function F(s, t — t;). The total population size at 
time t has the probability generating function 


[F(s, om] F(s, t — t;), 


since each bacterium creates independent families. Assume, however, that 
immigration occurs not at fixed times ¢;, but that the t; constitute events of a 
Poisson process with parameter r. We want to calculate the probability gener- 
ating function of the total population size in terms of F(s, t) and r. 

The immigration times t; are random variables and their number N(t) 
during the time interval [0, t] is also a random variable whose distribution 
function is Poisson with parameter rt. Let Y(t, t;) denote the population size 
at time t derived from a single immigrant into the colony at time t; for j = 
1, 2,..., N(t). Then 


N(t) 


YO= X Yt, t) 


is the population size at time t evolved from immigrants arriving during the 
time epoch [0, t]. 

The probability generating function of Y(t) can be computed by conditioning 
on N(t) in the usual way. This leads to the expression 


ELO] = Y Els" |N() = k] Pr{N() = k). 3.1) 
k=0 


By virtue of the developments of Section 2, Chapter 4, we know that the joint 
distribution in [0, ¢] of the arrival times t}, j = 1,2,...N(0), given the number of 
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arrivals N(t) = k, is the same as the distribution of the order statistics of k 
independent, uniformly distributed, random variables over [0, t]. Thus 


k! t t t 
EONO =H =% |an fan f dt, BLS roe] 
0 ty tk-1 


1 ¢ t t 
= fan fan ia fan E[s% =: YAE DT, 
E Ja Jo í 


because the integrand is a symmetric function of t,, t,,..., tg- Further, since 
different immigrants create independent histories, we get 


k k 
E[sti=1 40] = T] Epe] = T] F(s, t — t). 
j=l j=l 


Therefore 


E[s™|N(Q) = k] 


1 t t t k 

z far, [ats fdu TI Fet —t;) 
t Jo 0 o j=1 

1 k t 

t j=1 Jo 


| fre t— 1) ar]. 


Inserting this formula into (3.1) and taking account of the fact that N(t) is a 
Poisson process we obtain 


E[s?] 7m |: fro t— 1) ar fez eo. 


= epi fire, t-—t)-1] ish (3.2) 
0 


Hence, the probability generating function of the total population size at time 
t is 


G(s, t) = LF(s, t)}"° expyr [tr (s,t — t) — 1] ash (3.3) 
0 


Example. As an example we assume that each individual bacterium follows 
a growth law described by the Yule process X(t) of parameter f > 0 (see Section 
2 of Chapter 4). Then the size of the population at time t derived from a single 
bacterium at time t = 0 is governed by the probability law 


P(t) = Pr{X(t) = k| X(0) = 1}, 
where 
Pat) = e A et) ke RR... 
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Equivalently the generating function of the Yule process is given by 


se” Ft 


F t) = -p S fy —Btyk-1 ok — i 
OG a n 


Since we assumed that each immigrant follows the same law of growth as the 
original population, the generating function of the population size emanating 
from immigrant bacteria, in accordance with (3.2), has the form 


t — B(t—1) d 
E[s’®] = pir f £ Pe ra n} 


o 1 — s + se™ tt»? 


exp| log(1 — s + se~*) — r 


=e "(1—s + se”) "I = e"[1 — (1 — e ts] 7". 


If we take account of the original population of bacteria, then by (3.3) the 
generating function of population size at time t evolving from the initial popula- 
tion in addition to the immigrant population is 


G(s, t) = exp[—(r + Bn,)t][1 — (A — es] "etA, 


The expected population size at time t is (0/0s)G(s, t)|,-1, which reduces to 


(n $ ye - 1). 


Higher-order moments can also be evaluated by further differentiation of the 
generating function. 


4: Stochastic Models of Mutation and Growth 


Often in microbiological populations, initially homogeneous, one or more 
individuals change into a mutant form which then continues reproducing in that 
form. The mutation may correspond, for instance, to immunity from virus 
attack which the descendants inherit or generally to some property distinguish- 
ing the mutant form from the original, “wild-type” colony. We now examine a 
model describing stochastic fluctuations of mutant growth. We assume that the 
mother or parent colony starts with N individuals at time t = 0 and grows 
deterministically in such a manner that its size at time t is Ne‘. Further we assume 
that each “wild-type” has a probability ph + o(h) of changing into a mutant 
form during the time span [t, t + h]. Since the parent population at time t is of 
size Ne‘ and taking account of the fact that individuals behave independently, 
the probability of the formation of some mutant during [t, t + h] is 


pNeh + o(h). 
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Moreover, we postulate that the probability of two or more mutations occurring 
during the time span [t,t + h] is o(h). It is clear from the formulation above that 
the number of mutant types as a function of time describes a nonhomogeneous 
Poisson process with intensity function r(t) = pNe'(see Elementary Problem 12 
of Chapter 4). 

The developments of Elementary Problems 11 and 12 of Chapter 4, 
further inform us that the probability generating function of the number of 
events occurring during the time epoch [0, t] for the nonhomogeneous Poisson 
process of parameter r(t) is 


p(t, s) = ee), 


where 
m(t) = [ro dt = pN(e' — 1). 
0 


Thus, in our particular case 


" g(t, s) = exp[pN(e' — 1)(s — 1]. (4.1) 


Now suppose that each mutant evolves its own growth process, and let F(s, t) 
denote the probability generating function of the number of descendants of a 
single mutant at time t after the creation of the mutant. We shall assume in this 
model that the mutant population undergoes only birth and no death, i.e., 
F(0, t) =0. 

Let H(s, t; N) denote the probability generating function of the number of 
mutants at time t given that the parent colony consisted of N individuals at time 
t = 0 and there were no mutants present in the population at that time. We 
want to determine H(s, t; N) in terms of F(s, t) and the parameters p and N, 
which for the problem at hand are regarded as known. To this end, we introduce 
the probabilities 


P(t) = Pr} 


there are exactly k descendants at time t from a 
single mutation at time t = 0 : 


there are exactly k mutants by time t provided the 
h(t; N) = Pr+parent population at time t = 0 was of size N 
and included no mutants 


Then 
F(s,t)= ¥ PDF, (4.2) 
k=1 
since P(t) = 0 by assumption, and 


H(s,t; N) = > h(t; N)s*. (4.3) 
k=0 
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Clearly, from (4.1) 


h(t; N) = Pr{first mutation occurs after time t} = (t, 0) 
= exp[—pN(e' — 1)] = 1 — K(t; N), (4.4) 


where K(t; N) is the distribution function of the time of birth of the first mutant 
for an initial population of N parents. Its density function is 


_ dho(t, N) 


F pNé expl- pN(ë — 1)]. 


The event that there exist exactly k + 1 mutants (k = 0, 1, 2,...) at time t 
occurs if the first mutation happens at time t (0 < t < t), and the mutant and 
the parent population, now of size Ne’, together produce k further mutants 
during the remaining time span t-— t. The probability of no mutation before 
time t is exp[—pN(e* — 1)]. The occurrence of a mutation during the time 
interval (t, t + dt) has probability pNe* dt + o(dt). Finally, the probability 
that the mutant form and the parent colony, starting with size Ne’, will produce 
together exactly k mutants in time t — T is 


k 
» P,_((t — t)h(t — t; Ne) 
1=0 


(recall P(t) = 0). But the time t can be anywhere between 0 and t. Then, by 
the law of total probability conditioning at the time t of the occurrence of the 
first mutant and integrating with respect to the possible values of t yields 


t k 
h(t; N) = f expt- ence — none $ P,_(t — t)h(t — 7; neo] dt 
0 1=0 
for = li Dat 


With this quantity in hand and (4.4) we may pass to the corresponding prob- 
ability generating functions. This leads to the formula 


H(s, t; N) = exp[—pN(e’ — 1)] + y st Joxpi- once - 1] 
k=1 0 
k 
x pne| X P-t — tht — T; Ned] dt 
1=0 
= expl-pN(e = 1)] + f expl—pnte' - 1) 
0 


x pne| È h(t — t; Ne’)s' >. Py_(t — on dt, 
l= 0 k=l 
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where we have used the hypothesis P,(t) = 0. Referring to (4.2) and (4.3) we can 
write this relation in the simpler form 


H(s, t; N) = exp[—pN(e' — 1)] + pN fe exp[ —pN(e’ — 1)] 
0 
x F(s,t — y)H(s, t — y; Ne’) dy. (4.5) 


This is an integral equation for H(s, t; N) of a rather complicated form. We may 
solve it, however, by employing the following device. Let E(t; N) denote the 
number of mutants at time t, when the parent population consists of N in- 
dividuals at time t = 0. Since mutations occur according to the laws of a non- 
homogeneous Poisson process and individuals act independently, we easily 
infer that &(t; N) satisfies the functional equation 


Elt; Ni) + Št; No) = E(t; Ni + N3). (4.6) 

Because of the independence of €(¢; N,) and č(t; N3) and by the definition 

H(s, t; N) = Efs*), 
we conclude that 

H(s, t; N,)H(s, t; N2) = H(s, t; Ni + N3) 
for all nonnegative integers N,, N2, .... This plainly implies that 
H(s, t; N) = [H(s, t; DI“, 

i.e., 

H(s, t; N) = e9, (4.7) 
where L(s, t) = log H(s, t; 1). We still must determine the function L(s, t). To 


this end, we substitute the formula (4.7) into (4.5) and divide by N. This gives 


t 
e exp[-pN(e — 1)] 
0 


exp[NL(s, t)] — exp[—pN(e' — 1)] _ f 
N =p 


x F(s, t — yexp[Ne’L(s, t — y)] dy. (4.8) 


Now (4.8) presumably is only correct for N a nonnegative integer. However, 
we will operate with (4.8) as if it were valid for all real N > 0. (This can be justi- 
fied by appropriately varying p. We do not enter into details on this point, which 
is rather delicate.) 

Now, let N - 0 in (4.8). Then the right-hand side will approach 


t 
p foro, t — y)dy. 
0 
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For the left-hand side we have 


lim SPNL 9] — exp[-pN(e — 1)] 


N=0 N 
— — — t — 
i feele t]-— 1 í 1 — exp[—pu(e an 
u>0 u u 


- £ expt -pule — 1)] 
u=0 u 


= L(s, t) + ple — 1). 


Hence, formally we have 


= a exp[uL(s, t)] 
du 


u=0 


t 


L(s, t) = -pl — 1) + p [ ere. t—t)dt (4.9) 
0 
and now H(s, t; N) is also determined through (4.7). 
To compute the expected number of mutants at time t, let 


2 OF(s, t) 


4. 
AG : (4.10) 


s=1 


v(t) 


which is the expected number of descendants of a single mutant at time t after 
its creation. Then from (4.7) and (4.10) we have 


Ege Ny) = | = vt.oy PHS D 
s s=1 Os \.=1 
But 
t 
Ld, t) = —p(e’ —1)+ p [ea = 0, 
0 
since 
Fd,t-—t)= )pt—1) =1, 
k=0 
and from (4.9) and (4.10) 
t 
Os) =p [eve — Tt) dt. 
Os |s=1 o 
Hence 
t 
E[é(t; N)] = pN fee — t) dt. (4.11) 
0 


If for not very large t we may approximate 


v(t) ~ noe! (no = const.), 
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then from (4.11) 
ELé(t; N)] ~ pNnote'. 


5: One-Dimensional Geometric Population Growth 


Another example of geometric population growth is the following. Nuclear 
particles are situated on an infinitely long line. When they split their “ off- 
spring” are scattered according to some probability law. More exactly, we 
assume that an offspring of a particle situated at x will fall in the interval 
(x + y,x + y + dy) with probability density f(y), i.e., 


fQ)20, -am<y<o, ie f(y) dy = 1. 


Note that f(y) depends only on y, the distance between the parent and child 
and not on the actual location of the parent. Further, for ease of exposition, 
assume at first that each particle splits into exactly two new particles. Starting 
with one particle at x = 0 we call the offspring of this particle the first genera- 
tion; the offspring of the first generation form the second generation, etc. We 
define the random variable 


Z,(x; 0) = number of particles in the nth generation that are in (— œ, x], 
starting with one particle at x = 0 at the zeroth generation. Put 
Pex) = Pr{Z,(x; 0) = k}. 


Suppose we shift the position of the initial particle to u. Let Z,(x; u) denote 
the number of descendants in the nth generation located in [— œ, x] evolving 
from one particle initially at u. Because of the spatial homogeneity of the 
distribution law for the dispersion of offspring it is intuitively clear that 


Pr{Z, (x; u) = k} = Pr{Z,(x — u; 0) = k} = p(x — u). (5.1) 


The reader should supply the formal proof. 
We introduce the generating function 


9S; x) = § pP (5.2) 
k=0 
and the mean 
E,[x] = E[Z,(x;0)] = 2 OO = g,(1; x), (5.3) 


where the prime indicates the derivative with respect to s. 
The event that there will be exactly k particles in the (n + 1)th generation in 
(~a, x| will occur if the two offspring of the original particle at x = 0 are 
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located in (u, u + du) and (v, v + dv) (—% < u,v < œ), respectively, and each 
will have such a number of descendants n generations later that the total will be 
exactly k particles in (— œ, x]. The probability that the two particles of the 
first generation are in (u, u + du) and (v, v + dv), respectively, is 


Su) f(v) du dv 


The probability that these two particles give rise to a total of k descendants in 
(— œ, x] n generations later is 


pP(x — u)pk? (x — v) 
0 


M> 


L 


[see (5.1)]. Further, we observe that u and v may be anywhere op the real line, 
independently of each other. Hence 


wao S S uawfa 
k 
x X pix —wp@(x-—v) for k=0,1,2,.... 
1=0 
Passing to the probability generating function we have 
foo) foe) ive} h 
matid = Z S S OSOE PE- Do = 2) du dv 
k=0 -o y-o 1=0 
= [7 [sensory Pa — ws" pce — vst" du do 
-o Y- o 1=0 k=l 
= [of osi x = wgs: x = oF f(0) du do 


= i ROTO) as Í gesen 


œ 2 
grits = | f° oix = wf du) (54) 


We now generalize the assumption that each particle splits into two new 
particles and assume instead that each split produces r new particles, where r is a 
fixed positive integer. In place of (5.4) we obtain the formula 
r 


Gn+ (83 xX) = if G(s; x — u) f(u) du 
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If we further generalize the model so that each particle may split into r new 
particles with probability a,, then the same method leads to the formula 


fleas = af poser du), 


where 
A(z) = $a, zZ 
r=0 


is the generating function of the number of new particles produced in each split. 
The expectation E,,, ,[x] of the number of particles in the (n + 1)th genera- 
tion can be computed in the usual way. It becomes 


En+11X%] = gn+i1(1; x) 
= a(f gal; x — u) fu) iu) f aasx-ws du 65) 
Examination of (5.1) indicates that 
Gil; x —u) = DY p(x) = 1; 
k=0 
and 
A(1) = Vra,=m 
r=0 


is the expected number of new particles produced in each split. The formula 
(5.5) simplifies to the expression 


00 


Ey ilx] =m Í E,[x — ul f(u) du. (5.6) 


We may easily solve this recursive relation for E,[x]. Simply regard the original 
particle at x = 0 as the zeroth generation. Then obviously 


0 if x <0, 


Folx] = F if x>0. 


From (5.6) 
E,{x] = m f © flu) du = mF), 


where F(x) is the cumulative distribution function corresponding to f(x). 
Further 


g0 


E,[x] = m? i F(x = u) f(u) du = m? FO (x), 


ai: 
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where F'?(x) is the convolution of F(x) with itself. By induction we plainly infer 
that 
E,[x] = mF”), (5.7) 


where F(x) is the n-fold convolution of F(x) with itself. 
If the density function f(y) possesses a variance c? and mean pu then the 
central limit theorem tells us that for each fixed č 


Fn + éon) +O, n> o, 


where (€) is the standard normal distribution function, i.e., 
1 č 
DO = | exp(—1?/2) dn, 
2m J- roe) 


This obviously leads to an asymptotic formula for E,(x), namely, 


Eny + 0,/n] 
m 


> (é) as n> œ, 


which is of some independent interest. 


6: Stochastic Population Growth Model in Space and Time 


Suppose certain plants are distributed in space in accordance with a two- 
dimensional Poisson process with intensity parameter A. (We consider a 
model for the distribution of plants in two-dimensional space but an entirely 
parallel development would apply in a three-dimensional formulation.) We 
assume that each parent plant whose location is described by the two-dimen- 
sional vector rọ gives birth, independently of other plants, to a random number 
of progeny with a probability generating function H(s); that is, 


H(s) = > hs! 
k=0 


and h, denotes the probability that a parent. plant will produce k offspring. 
Assume further that the progeny of one parent plant located at rp are distributed 
independently in space around rọ in accordance with the two-dimensional 
density function f(r — rg) which depends only on the vector r — ro; e.g., the 
two-dimensional normal density function 


1 7 1 : 
fa- ro) = (aa) exp} — 792 [œ — x)? + y - w} 


Thus the probability that a given progeny of a parent at rọ will be found in the 
region R is 


p= | foe roar. 6.1) 
R 


6. STOCHASTIC POPULATION GROWTH MODEL 417 


If a parent plant at rọ has exactly n progeny, then the number of offspring in 
region R due to this single parent will follow the binomial distribution with 
parameters p and n, where p is given by (6.1). However, the number of progeny 
of a single parent is a random variable with probability generating function 
H(s). The usual method of compounding a generating function by using the 
law of total probability shows that the probability generating function of the 
number of progeny in R due to a single parent plant at rọ is H[1 + p(s — 1)], 
where p is given by (6.1). Our objective is to calculate the probability generating 
function of the number of progeny in region R produced by all the parent plants 
located in a region S. For this purpose, we introduce the following notation. Let 


X(S) = number of parent plants in region S, 
Y(ro, R) = number of progeny in R due to a single parent plant at ro, 
Y(S, R) = number of progeny in R produced by all parents in S. 
Then 
Y(S,R) = Y Y(ro, R). 


roeS 


If S is a bounded region the sum is finite with probability one since the number 
of parents in S follows a Poisson distribution with parameter 2A(S) [A(S) 
denotes the area of S]. Further, X(S) describes a two-dimensional Poisson 
process with intensity parameter A. The generating function of Y(S, R) will be 
calculated by conditioning on the values of X(S). Thus 


g(s) = E[S ®] = 5 E[{s*®®)| X(S) = k] Pr{X(S) = k}. (6.2) 
k=0 


Since the parent plants produce independently, we have 
E[s?®®)| X(S) = k] = {E[s”*©® | X(S) = 1)}*. (6.3) 


Moreover, from the theory of spatial Poisson processes we know that under 
the condition X(S) = 1, 


Y(S, R) 5 Y(ro, R), 
where ro is uniformly distributed in S. Then 


E[s” ®|X(S) = 1] = E[s" ® |r, uniformly distributed in S] 


1 
= 5 f HEEE Didi (64) 


where p is given by (6.1). The final equality is valid because we have shown 
H[1 + p(s — 1)] to be the probability generating function of the number of 
progeny in R due to a single parent located al a point rọ. Here, however, ro is 
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uniformly distributed in S and so (6.4) results. Now combining (6.2)-(6.4) 
yields 
_ as) AACS) 
n |A(S) kt? 
as X(S) is Poisson distributed with intensity parameter 1. This simplifies to the 
formula 


g(s) = y | Ht + p(s — 1)] dr ote 


g(s) = epla [a [1 + p(s — 1] — 1) ato (6.5) 
where 
p= | f = ro dro. 


In the final expression it is meaningful to let S be the whole two-dimensional 
space. 

Formula (6.5) also holds if the plants are distributed in space with an 
underlying three-dimensional spatial Poisson distribution and r, rọ and R, S 
denote three-dimensional vectors and regions, respectively. In either the two- 
or the three-dimensional case we may approximate p by 


fa = ro)A(R), 
provided the region R is very small. Then (6.5) reduces to 


g(s) x epla f (HUL + f(r — ro)A(RXs — 1)] — 1) ito 


If S is the whole space (two or three dimensional) then we may write 


g(s) © expla fea [1 + f@MA(R\Xs — 1) — 1) du}, (6.6) 


where the integration (double or triple) is over the whole space. 
As an example we take the normal distribution in the plane for the function 


fie., 
f@ = mae Fle yf where u = (x, y) 
= 5757 P| ae +) ere u= (x, y), 
and let the probability distribution of the number of progeny of a single parent 


be 
h, = uX(1 — u), k=0,1,2,..., 


where pu is a constant, 0 < u < 1. Then H(s) = (1 — u) X ko 's* = (1 — w)/ 
(1 — us). Substituting into (6.6) and simplifying, we obtain 
os ee A(R\s — 1) exp[—(207)” '(x? + y?)] dx d 
ws) = expla f f HARY ) exp[ — (20?) *( yn ds De 
2na™(1 — u) — WA(R)(s — 1) expl = (205) t(x? + ya 


7. DETERMINISTIC POPULATION GROWTH 419 


which is an appropriate approximation for A(R) small. After we switch to polar 
coordinates r and 0, the expression for g(s) becomes 


7 2 HA(R)(s — 1)r exp(—(207)~ tr?) dr 
or expla | Sep aD TETE 


1 A = 1 2 
= expy nd Í mt oh (where z = exp(—r?/207) 
o 210 


*(1 — u) — uA(R)(s — Lz 
= exp) —2nio? log 


=[1- $s- D5, 


where 


_ HA(R\s — 1) 
2no7(1 — u) 


and k = 2nho?. 


This is the probability generating function of a negative binomial distribution. 
7: Deterministic Population Growth with Age Distribution 


In this section we shall discuss some simple deterministic models of population 
growth which take account of the age structure of the population. The stochastic 
version of these growth processes is quite complicated and beyond the scope of 
the text. 


A. A SIMPLE GROWTH MODEL 
We begin by considering a single species, and we let 
N(t) = population size at time t; 


v(t) dt = number of offspring produced by each individual in the “short” 
interval (t, t + dt). More precisely, the number of offspring produced 
by an individual in the interval (t, t + h) is v(t)h + o(h). 


Then 
N(t + h) = N(t) + N(d)w(t)h + o(h) 
N(t + h) — N(t) = NOS: o(h) 
h h 
Taking the limit of both sides as h + 0, we obtain 
an v(t)N(t). (7.1) 


dt 
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Its solution is 


N(t) = N(0) exo Í 50) ix) (7.2) 
o 


where N(0) denotes the initial population size. If the integral fo v(t) dt diverges 
as t > œ, then the population grows to infinity. If v(t) is constant then N(t) = 
N(O)e”, and the population grows to infinity exponentially at the rate v. 


B. A MODEL IN WHICH POPULATION SIZE DETERS GROWTH 


In the above model an increase in size of the population does not deter growth. 
We now take population size into account by leting v(t) depend upon N(t). 
Specifically, suppose 


v(t) _ a(t = me) for N(t) <4, 


0 otherwise, 


where « and £ are positive numbers. Note that the population size cannot grow 
beyond «. In this case (7.1) becomes 


ao B 


BNC) — © N?O). (7.3) 


= Ni agli - a! ej 


We can separate variables and the solution of (7.3) becomes 


aN(0)e* : aN (0) 
a + N(OX(ef! — 1) ae + N(0) — N(O)e~ 


Inspection of the second expression of (7.4) reveals that N(t) > a as t > œ. 


N(t) = 


(7.4) 


C. EFFECT OF AGE STRUCTURE 


We shall now consider the effect of age structure on a growing population. We 
need the following notation: 


p(u, t) = the frequency function of individuals of age u in 
the population at time t, i.e., p(u, t) has the 
property that f!2? p(u, t) du = proportion of 
individuals in the population at time t who are in 
the age range (u4, u2). The actual number of 
individuals in this age bracket is, of course; N(t) 
times this proportion. (7.5) 


b(t) = the rate at which new individuals are being 
created in the population at time t. More 
explicitly, fi? b(t) dt = number of new individuals 
created in the time interval (t4, t2). (7.6) 
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A(u) dt = expected number of progeny of a single individual 
of age u in the next dt units of time. (7.7) 


lu) = probability that an individual will survive, from 
birth, at least u units of time. (7.8) 


c(u) = the infinitesimal death rate, i.e., the probability 
that an individual of age u will die in the next h 
units of time is c(u)h + o(h). (7.9) 


The relation between /(-) and c(-) may be derived as follows: For given u, 
h > 0, an individual will survive, from birth, at least u + h units of time if and 
only if he survives, from birth, at least u units of time, and then does not die in 
the following h units of time. Thus 


lu + h) = lu) — c(u)h] + olh) 
and 


lu + h) — (u) __ u)c(u)h + o(h) 


h h 
Taking limits as h > 0, we have 
aly) = —I((u)c(u). 
du 
Solving, we obtain 
(u) = (0) ep] - [oe g 2 exp| - [oe gi (7.10) 
0 0 


since [(0) = 1. 

In considering the effect of age structure on a growing population, our 
interest will center on b(t), i.e., we may regard A(u), I(u), and c(u) as known, and 
the problem is to determine b(t). The rate at which new individuals are being 
created in the population at time t has two components. One component, 
bo(t), say, is the rate of creation due to those individuals in the population at 
time t who already existed at time zero. The density of individuals of age u in the 
population at time zero is p(u, 0). The probability that an individual of age u at 
lime zero will survive to time t [at which time he will be of age (t + u)] is 
l(t + u)/i(u). Hence the proportion of those of age u that survive to time t is 
[Ki + u)/Ku)]p(u, 0). The rate of births for individuals of age t + u is A(t + u). 
Now, adding over all ages, we obtain 


(t + u) 
(u) 


The other component of b(t) is the rate of creation of new individuals at time t 
due to those individuals in the population who were born after time zero. The 


bo(t) = N(0) [ +u) p(u, 0) du. (7.11) 
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rate at which new individuals are being created in the population at time t is 
b(t). For 0 < t < t, the probability that an individual born at time t will 
survive to time t, at which time he will be of age t — qt, is I(t — t). The rate of 
births for individuals of age t — t is A(t — 7). It follows that 


b(t) = b(t) + [x — t)l(t — t)b(t) dt, (7.12) 
0 


where b,(t) is given by (7.11). 
The relation (7.12) is a renewal equation (see Chapter 5). Its solution can be 
obtained by iteration. 


Example. Assume that both the birth rate and the infinitesimal death rate are 
constants, independent of age, i.e., A(u) = A, c(u) = c. Then, by (7.10), the prob- 
ability of an individual living to age u is 


Ku) = e7“, (7.13) 


Suppose that the population started with the creation of a single individual at 
time zero. Then, 


N(0) [ow 0) du = 1. (7.14) 
0 


Actually, the age density p(u, 0) should be replaced by a degenerate distribution 
concentrated at u = 0. From (7.11), (7.13), and (7.14), we conclude that 

bolt) = Ae". (7.15) 
Hence by (7.13) and (7.15), (7.12) becomes 


b(t) = dew" + å feza dr. (7.16) 
0 


We wish to solve (7.16) for the function b(-). Multiply both sides of the 
equation by e“ to obtain 


t 
e"h(th=A+A fero dt 
0 
and let 


f(t) = AbC). (7.17) 


Then the equation, to be solved now for f(-), becomes 


f@=AatraA fro dt. 


Clearly f (0) = A, and upon differentiating both sides with respect to t we obtain 
f(t) = Af (t). Thus f(t) = Ae”, and, inserting this expression in (7.17), we have 


b(t) = Aet, (7.18) 
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Having determined b(t) in this example, we now utilize this result to deter- 
mine the age structure as given by N(t)p(u, t). Since we have assumed, in deriving 
(7.18), that the population started with the creation of a single individual at time 
zero, we need only consider the case u < t. An individual is of age u at time t if 
and only if it was created at time t — u. The rate at which new individuals are 
being created in the population at time t — u is b(t — u) = Ae@~"). The 
probability that an individual will live to at least age u is, by (7.13), e ™. It 
follows that 


N(t)p(u, t) = e “b(t — u), ust. (7.19) 
Substituting (7.18) into (7.19), we obtain 
N(t)p(u, t) = e7 “AeA VE) = Jem Me u<t. (7.20) 


We return now to the general formulation, keeping the assumption that the 
population started with the birth of a single individual. Then the derivation of 
(7.19) holds in general and for u < t we have 


N(t)p(u, t) = b(t — u)l(u). (7.21) 


Here, b(t) is determined as the solution of the renewal equation (7.12) under the 
special circumstance b(t) = 0, since there were no individuals living at time 
t = 0. Thus, 


Wt) = Í E EE E E Í E OE 
0 5 0 
Let (u) = A(u)i(u). Then, the equation becomes 


b(t) = [ — u)ọ(u) du. (7.22) 


As a tentative solution of (7.22), we try 
b(t) = e” (7.23) 
where y is a constant to be determined so that (7.22) holds for t large. Inserting 
(7.23) into (7.22) leads to the condition 
t t 
et = [e*-%ea du = e” O du 
0 0 
or, equivalently, 
t 
fero du = 1. (7.24) 
0 


Weare interested in the asymptotic age structure of the population ast > œ. 
Letting t > œ in (7.24) leads to the condition 


R(y) = Í e olu) du = t. (7.25) 
o 
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Rly) 


FIG. 1 


From its definition we find that R(y) is a strictly decreasing function of y; hence 
(7.25) has at most one positive root. Let 


0 oe} 

R= f ol(u) du = f A(u)l(u) du. 
0 0 

R is referred to as the reproductive value of an individual; it is his expected 

number of offspring during his lifetime. R is sometimes called the Malthusian 

rate. 

If R > 1, then R(y) has the form shown in Fig. 1 and a solution yọ > 0 of 
(7.25) exists. 

In this case b(t) is asymptotically proportional to exp(yot), and the popula- 
tion grows exponentially. If R < 1, then R(y) has the form shown in Fig. 2 and a 
solution yọ < 0 of (7.25) exists. In this case b(t) is asymptotically proportional 
to exp(yot), so that the population dies out exponentially fast. 

If R = 1, then the problem must be studied stochastically. 

The results presented above are rather heuristic. The assumptions and 
analysis necessary to rigorize the arguments are beyond the scope of this book. 
The main conclusion is that under suitable conditions we expect the growth 


Riy) 


HG 2 
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rate of the population to behave asymptotically like an exponential. Further 
evidence of this phenomenon will be forthcoming when we discuss a discrete 
model below. 

We continue in the same heuristic manner. For the case R > 1, we shall 
determine the asymptotic age structure as given by the density function p(u, t). 
By (7.21) 


b(t — u)l(u) 


plu, t) = NO ’ ust. 


Now b(t — u) is asymptotically proportional to exp[yo(t — u)]; hence, N(t) 
is asymptotically proportional to exp(y9t). Therefore p(u, t) is asymptotically 
proportional to 


exp[yo(t — u)]Ku) 
exp(yot) 


= exp(— you)l(u). 


The factor of proportionality is determined from the fact that p(u, t) is the density 
function of a probability distribution. Therefore, the age structure of the 
population for large t is given by the asymptotic density function 


exp(— you)l(u) 
for exp(— yo x)M(x) dx 


plu, t) oe 


8: A Discrete Aging Model 


We now give a precise treatment of the problem of the previous section for a 
discrete time model. Let t = 0, 1, 2,..., and let 


n® = number of individuals of age x at time t, 
P, = proportion of individuals of age x surviving to age x + 1, 
x = number born, in the next unit of time, to each parent of age x. 


Here F, is assumed independent of t and P, and F, are positive for x = 0, 1, 
2,...,m. Assume that no-one lives beyond age m;i.e., set P,, = 0. Then the transi- 
tion relationship for age structure between t = 0 and t = 1 is given by 


m 
1 0 
by? = peas aie 


x=0 
1 0 
nt pan Pong a 


ny? = Pini? 


1) pon 0 
Nin’ 7 Praia 
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We write this transition relationship in matrix form as 


0) _ 0 0 0 
ní ) = (nP, n?, ..., nD), 


n® = Mn, n = (nY, nD, Ters ni), (8.1) 
where the matrix M is 
Fo F, Fy © >> Fa 
Po 0 see vee oe 0 
M= 0 P O >e «+: 0 |l. (8.2) 
0 0 O > Py, 0 


M is a matrix with nonnegative elements. Properties of such matrices are given 
in Section 2 of the Appendix in A First Course. Since P, and F, do not depend 
upon time, the same transition relationship holds between any two consecutive 
times. Thus we may iterate formula (8.1) and obtain 


n® = M'n, n® = (n%,n%,..., n®), al as Yee (8.3) 
where we wrote 
n,=n® for x=1,2,...,m. 


For sufficiently large t, all of the elements of M' are strictly positive. Also, there 
exists an eigenvalue A, > 0 which is strictly greater in absolute value than any 
other eigenvalue (Theorem 2.2 of the Appendix). For any vector n, M'n is asymp- 
totically equal to 45z, where z is a certain multiple of the unique right-hand 
eigenvector corresponding to the eigenvalue 1). Asymptotically, as t — 00, (8.3) 
becomes 


n® = M'n ~ Joz = [exp(t log Ao) ]z 


and log A) corresponds to the critical value yọ introduced in the heuristic 
continuous time analysis. The population grows exponentially if A) > 1, and dies 
at an exponential rate if A, < 1. 


9: Compound Poisson Processes 


Let {Y;}7, be a sequence of independent identically distributed random vari- 
ables having the common distribution function F and characteristic function 9. 
Let {N(t), t > 0} be a Poisson process with parameter A > 0, and independent 
of the sequence {Y,}. The stochastic process 

NO 


xXH=YH%, t20 
k=l 


is called a compound Poisson process. 
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Compound Poisson processes are used to model a large variety of physical 
situations. Typically, “events” occur in accordance with a Poisson process and 
each event has some randomly determined “value” associated with it. Here are 
some examples: 


(a) Let N(t) be the number of claims against an insurance company up to 
time t and suppose Y, is the amount of the ith claim. Then X(t) is the accumula- 
tion of money claims up to time t. 

(b) Let N(t) be the number of transactions in a publically traded security, 
say a share of stock, up to time t, and suppose Y; is the number of shares traded 
in the ith transaction. Then X(t) is the total number of shares traded up to time t. 

(c) Let N(t) be the number of shocks that have occurred in a system up to 
time t and suppose Y; is the damage or wear caused by the ith shock. Postulating 
that damage is additive, X(t) is the cumulative damage to the system up to time t. 


STATIONARY INDEPENDENT INCREMENTS 


A compound Poisson process has stationary independent increments. We confirm 
this property in two steps, first showing that nonoverlapping increments are 
independent. Let 0 < tọ < t4 < +++ < t, be given. Since Y,, Y,,... are mutually 
independent and independent of the Poisson process, which itself has indepen- 
dent increments, the random variables 


{Y aeey Yuito> N(to)}, 
{ Yvitoy + 19 es Yna N(t,) — N(to)}, 


Wie aia ie oto Ywitny» Nn) — N(tn—1)} 
are independent. Hence the increments 
X(to) — X(0) = Y, +--+ + Yao), 
X(t) — X(to) = Yuaoy+1 ttt + Yna 


X(t,) = X(ta-1) = YNan-1)+1 preng Yvitn) 


in the compound Poisson process are independent. 

To show that the distribution of the increments is stationary, let t > 0 and 
h > 0 be given. Then N(t + h) — N(t) has the same distribution as N(h) and 
Yis... Yy have the same distribution as Yyay41,---5 Yna+m- Thus 


X(t + h) — X(1) = Yuya + + Yuan 
has the same distribution as 


X(h) = Yi ee a Yuin: 
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This completes the proof that a compound Poisson process has stationary 
independent increments. i 

Thus, a compound Poisson process is Markov (Elementary Problem 7, 
Chapter 1). The process is a Markov chain only if the possible values of the 
summands form a discrete set, say, {0, +1, +2, ...}. If the random variables 
Y,, Y), ... are not discrete, we have a general Markov process with an uncount- 
able state space. 


THE CHARACTERISTIC FUNCTION 


Since N(t) follows a Poisson distribution with parameter At, its generating 
function is 


In S) = ea 


Note that X(t) is a random sum of random variables. Referring to the results 
of Section 1E in Chapter 1, we have 


Px@4) = InolP)] (where (u) = Ele" :]) 
exp{—At[1 — g(u)]}, -0 <u< +o. (9.1) 


Next we write down the joint characteristic function of the process at any finite 
set of time points 0 < t, <--- < t,. Reliance on the property that {X(t), t > 0} 
has stationary independent increments is vital. The joint characteristic function 
becomes 


PX(t1), a., Xli ++ +> Un) = El[exp{iu,X(t,) + +++ + iu, X(t,)}] 
E erfi atx (t,) — X(t- D1} 
(where a, =u, +: + up) 


IT Elexptia X(t.) — X(t, 1)1}] 


IT Elexptiad X(t, — tJ) 


TI Pran- (0) (9.2) 
k=1 


Since the joint distributions of arbitrary finite sets {X(t,),..., X(t,)} characterize 
completely the stochastic process, we have established the following theorem. 


Theorem 9.1. Let {X(t), t > 0} be a stochastic process having stationary in- 
dependent increments and for which X(0) = 0. Then {X(t), t = 0} is a compound 
Poisson process if and only if the characteristic function ~x of X(t) is of the form 


Pxw(u) = exp{—At[1 — p(u)]}, -wo<u<+0 


where À > 0 and @ is a characteristic function. 
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The mean and variance of X(t) may be computed routinely by differentiating 
the characteristic function. The results are 

E[X(1)] = wat, 
and 
variance of X(t) = (o? + p?)At 

where u and o? are the mean and variance, respectively, of Y,. 
THE DISTRIBUTION FUNCTION OF X(t) 
The distribution function for X(t) can be represented explicitly after conditioning 


on the values of N(t). Thus we obtain 


Pr{X(t) < x} 


ll 

ac) 

= 
PASAN 
` img 

x< 

IA 

x 
aKa 


Pk < x|N(t) = nh Cre 


n=0 i=1 
= (Ae oe 
=> F "(x) (since N(t) is independent of { Y;}), 
n=0 n. 
(9.3) 
where f 
F(x) = Pr{ Y, +-+ ¥, < x} 
with 
1 for x20. 
Oy = 2 Y, 
HO) 10 for x <0. 


Example 1. Let N(t) be the number of shocks to a system up to time t and let Y, 
be the damage or wear sustained by the ith shock. We assume that damage is 
positive, that is, Pr{Y; > 0} = 1, and that the damage accumulates additively 
so that X(t) is the total damage up to time t. Suppose the system continues to 
operate as long as the total damage is strictly less than some critical value a and 
fails in the contrary circumstance. Let T be the time of failure. Then 


{T > t} if and only if  {X(t) < a}. (9.4) 
In view of (9.3) and (9.4), we have 


au no- At 
oe Fa), (9.5) 


PriT>y= > 


n=Q 
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All summands are nonnegative, so we may certainly interchange integration and 
summation to get 


E[T] 


f Pr{T > t} dt 
0 


D (f ve ar) F(a) =)" § F(a), 
n=0 \/O n. azo 


an expression for the mean time to failure. 
Consider the special case where Y,, Y}, ... each follow the exponential 
distribution, 


Pr{Y¥,< as} =1-—e°"%, a>0. 
Then Y, + --- + Y, follows a gamma distribution, with 


n-1 k ,— ua 00 k ,— ua 
y ee sy We ; n>0 


F(a) = 1 — 
ko k! k=n k! 


and 


= = a 
(n) 

LF Qe oy ie 

_= ¥ Pa 


kTonzo k! 


> (k + 1) Ces eS 


= 1 + pa. 


In the special case at hand, we have 
ELT] = 47+ X, F™@) 
n=0 


= (1 + pa)/À. 


SUM OF INDEPENDENT COMPOUND POISSON PROCESSES 


Compound Poisson processes have many desirable properties. Here we display 
one of these: that the sum of two independent compound Poisson processes is 
itself a compound Poisson process. For k = 1,2, let {X,(t), t = 0} be compound 
Poisson processes, mutually independent, and with Poisson rate parameter A,. 
Let,@, be the characteristic function of the “value” sequence associated with the 
kth process. Form X(t) = X,(t) + X(t). We claim that {X(1), t = 0} is a 
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compound Poisson process with Poisson rate A = A, + A, and associated 
characteristic function 


Ay Ar 
plu) = as gi(u) + EF p2(u). 
This characteristic function corresponds to that of a random variable Y which 
with probability 4,/(A, + 4,) assumes the value Y(1) and with probability 
A2/(A, + Az) assumes the value Y(2), where Y(1) and Y(2) are random variables 
with characteristic functions y, and @,, respectively. 
Clearly {X(t), t > 0} has stationary independent increments, since both 
{X,(t), t > 0} and {X,(t), t > 0} do. In view of Theorem 9.1 we need only 
compute the characteristic function of X(t). Thus, 


Px lu) = Px nM Px (4) 
= exp{—At[1 — p.¢i(u) — pr ¢2u)]}, 


where A= A, +A, and p; =A,/A. Thus Pxa(u) has the desired form and 
{X(t), t > 0} is a compound Poisson process. 


Example 2. Taxis arrive at a stand according to a Poisson process with rate 
A,. Customers arrive at the stand according to an independent Poisson process 
with rate A,. If a taxi arrives and individuals are waiting, the first person in line 
is served; if no individuals are waiting, the taxi waits. If a person arrives and 
there are taxis waiting, the person requisitions the first taxi; if no taxis are avail- 
able, the person waits. 

Let X(t) be the number of taxis waiting at time t if taxis are waiting for people, 
and be minus the number of people waiting at time t if people are waiting for 
taxis. Then {X(t), t > 0} is a stochastic process with state space {0, +1, +2,...} 
which may be written, assuming X(0) = 0, in the form X(t) = X,(t) + X(t), 
where {X (t), t > 0} is a Poisson process with rate A, and {X,(t), t > 0} is the 
negative of a Poisson process having rate A,. The result on the sum of compound 
Poisson processes tells us that X(t) is a compound Poisson process having rate 
A = À, + A, and whose “values” Y,, Y,,... have the common discrete distribu- 
tion 


AiAi + 42) if k= 41, 
p(k) =4 AMA, +42) if k= 1, 
0 otherwise. 


If S, = Y, +---+ Y, then (n + S,)/2 is distributed binomially with param- 
clers n and p = 4,/(A, + A2) so that for (n + k)/2 = 0, 1, ..., n the jump of the 
distribution at k is 


F™k) — F%™k—) 


Pr{S, = k} 
n 
n+ k pore Da pyr 2, 
a 


4 


i 
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We apply (9.3) to compute the probability distribution of X(t). Since F(k) — 
F™(k—) is zero unless (n — k)/2 = 0, 1,..., n, we change variables, replacing 
n by v = (n — k)/2, to get 


= T 


Pr{X() = k} = xs LF) — F kJ] 


7 ere Ta py’ 
ers viv + k)! 


In terms of the modified Bessel function, 


1 2vt+r 
I(x) = > (5) 


vTwvtrth 


this becomes 
k/2 
p = 
Pr{X(t) = k} = (4) e *],(2At./p(1 — p). 


LEVY PROCESSES 


A stochastic process having stationary independent increments and continuous 
in probability in the sense that for every positive e, 


lim Pr{|X(t) — X(s)|>e} =0, t>0, 
Sot 


is called a Lévy process. A compound Poisson process is a Lévy process; so is 
Brownian motion and the so-called “uniform translation,” the process U(t) = at 
for t > 0, where a is a fixed constant. While beyond the scope of this book, it is 
a fact that the general Lévy process can be represented as a sum of a Brownian 
motion, a uniform translation, and a limit (actually, an integral) of a one-param- 
eter family of compound Poisson processes, where all the contributing basic 
processes are mutually independent. 

Let {X(t), t > 0} be a compound Poisson process. Consulting Eq. (9.1) we 
see that 


Pxm lu) = Lexa)’, 
or 


Pxa(u) = Loxo]. (9.6) 


A random variable X is called infinitely divisible if for every positive integer n, 
there are n independent and cay: distributed random variables X®, ..., 
X™ such that their sum X{ + --- + X® has the same distribution as x. In 
terms of characteristic fictions the requirement becomes that for every 
positive integer n, there exists a characteristic function @, for which 


Efe"*] = [p,u)]". 
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Where { X(t), t > 0} is a compound Poisson process, inspection of (9.6) reveals 
that X(1) is infinitely divisible. Both Brownian motion {W(t), t > 0} and uni- 
form translation {U(t), t > 0} possess the property expressed in (9.6), so that 
B(1) and U(1) are infinitely divisible as well. It is readily checked that the sum 
of a finite set of infinitely divisible random variables is itself infinitely divisible 
and the limit, in distribution, of a sequence of infinitely divisible random vari- 
ables is infinitely divisible. The latter fact follows from Lévy’s convergence 
criterion, mentioned in Chapter 1. 

Since every Lévy process {L(t),t > 0} may be written as a sum of a 
Brownian motion, a uniform translation and a limit of compound Poisson 
processes, it follows that L(1) for such a process is always an infinitely divisible 
random variable. 

A converse is also valid. For every infinitely divisible distribution, there 
exists a Lévy process {L(t), t > 0} for which L(1) has the specified distribution. 

Since Eq. (9.6), or in this case 


Pr lu) = Loro u), 
is satisfied for every E process, it follows that for an arbitrary such process 
E[L(t)] = tE[L(1)] 
and 
variance of L(t) = t x variance of L(1), 


whenever these moments exist. This was argued from first principles in Chapter 1. 


DECOMPOSITION OF POISSON PROCESSES 


Let X(t) =) ¥,,t > 0, be a compound Poisson process, where {N(t), 
t > 0} is a Poisson process with parameter A, and Y,, Y}, ... are independent 
identically distributed random variables, independent of {N(t),t > 0}. Construct 
two new processes as follows: Let 


U, = Y, if Y, >00, f0 if ¥, >0, 
ane if Y, <0, k ly if Yes 


Then set 


N(t) N(t) 


X= LU, and X= YK, t20. (9.7) 
k=0 k=0 


Then {U,} is a sequence of independent and identically distributed random 
variables, independent of the Poisson process {N(t), t > 0}, and thus {¥ (t), 
t > 0} is a compound Poisson process. Similarly {X.(0), t = 0} is a compound 
Poisson process. What is also important, and more interesting, is that the two 
processes are independent of one another. By this it is meant that for every 
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finite set of ordered time points t44, t,2,...,t1, and t21, t22,- -s tam, the vectors 
(X (tii), X1(t12), +++» Xitin) and (X2(t21), X2(t22), - - -> X2(t2m)) 

(9.8) 
are independent. We leave it to the reader to verify that only the case m = n 
and tii = toy = ty < ti2 = t32 = t3 < t < tin = tam = ta need be consid- 
ered (combine both sets of time points). The random vectors in (9.8) are in- 
dependent if and only if their joint characteristic function factors into the 
marginal characteristic functions. (See Problem 11, Chapter 1.) That is, we 
need to show for all real numbers u41, ..., Uins U21» +--> Uz2n, that 


bl exp}i $ Xito + U2, xn] T z| exp} 3 Uik x} 
x bl ex 5 Ud, xot] (9.9) 
k=1 


Since the vector-valued process {(X,(t), X-(t)); t > 0} also has stationary 


independent increments, we may execute the computations in the manner 
preceding Theorem 9.1 and deduce thereby 


zlexp|i X Uik X (tk) + id uX} 


= b| exo}: 5 akl X(t) — X4(t—1)] + i È daulX a( te) = Ko} 
k=1 = 
where dj, = Uujn + +++ + Uj. 


u Efexp{ia [X i(t) — X(tk-1)] + tan LX 2(t) — Xt- D] 


= LIE lexp tian LX s(t — ty—1)] + tay LX (tk — G13) (9.10) 


Similarly, since {X (t), t > 0} each exhibit stationary independent increments, 
we obtain (see (9.2)) 


E [efi È UX e] = LI PX itr-tr- 1B je)» (9.11) 


By noting the correspondence between the individual terms in the products of 
(9.10) and (9.11) we see that to prove (9.9) it suffices to consider the case n = 1 
and prove that 
Efexp{ilu,X ,(t) + u: XJ] = Elexptiu, X,(}]E[exp{iu, X,(O}]. (9.12) 
We condition now on N(t) = n. Thenu, X (t) + u, X(t) =W +--- + W,, 
where W,,..., W, are independent and identically distributed: 
W, a uy Y, if Y, > 0, 
U2 Y, if Y, < 0. 
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Thus 
Efexp{ilu, X (t) + u.X,O]}|N@ = n] = Elexp{i(W, + --- + W,)}] 
= {ELexp(iW,)]}” 
= Lew)’, 
where gy(1) = ELexp(iW,)]. Removing the condition N(t) = n gives 


e 4 At) 
k! 
= exp{—At[1 — py(1)]}. (9.13) 


BlexpfilusX,() + X00] = È lov 


Now 
Pw(1) = ELexp(iW,)] 
= Efexp{i(u, Y$ + uz Y,)}] 
0 0 
= f explivsy) dFy0) + f expliuiy) dF) 
+o +o 
= [explingy) aFx(y) + | expGuiy*) Fy) = 1 
(the notation y* = max(y, 0), y7 = min(y, 0) is used) 
= (u2) + p,(u,) — 1, (9.14) 
where 
g,(u) = E[exp(iuU,)] = E[exp(iuY;)] 
and 


2(u) = Elexp(iu,)] = ELexp(iuY;, )]. 
Then, substituting (9.14) in (9.13) gives 
ELexptilu, X(t) + uz X2(0)]}] = exp{—At — 9, (u,) — g2(u2) + 1} 
= exp{—A[l — 9,(4,)]} exp{—AtL] — p2(u2)]} 
= E[exp{iu, X ,(t)} ]ELexp{iu, X()}], 


which completes the proof of independence. : 

The restriction to the partition between Y, > 0 versus Y, < 0 served only for 
convenience in exposition. The same steps will verify the following theorem, 
which decomposes an arbitrary compound Poisson process into independent 
compound Poisson subprocesses. 
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Theorem 9.2. Let X(t) = YX, ¥,, t =O be a compound Poisson process, 
where {N(t), t > 0} is a Poisson process with parameter A and {Y,} is a sequence of 
independent and identically distributed random variables. Let A,,..., Am be a 
partition of the space of possible values for Y,, that is A; © A; = Ø if i #j and 
Pr{¥,¢€ A, U+- U Ám} = 1. Let 


yi = Yı if Y, € Aj, 
“(0 f WEA, 
fori =1,...,m, and define 


N(t) : 
X=} Yi, i=1,...,m. 
k=1 


Then each {X (t), t > 0} is a compound Poisson process and the processes 


{X,(),t > 0}, {X,(H,t > 0}, ..., {Xn(t), t > 0} 


are mutually independent. 


THE POISSON POINT PROCESS ASSOCIATED WITH A COMPOUND POISSON PROCESS 


Consider events occurring on the positive time axis (0, œ) in accordance with a 
Poisson process having parameter 4. Let T, be the time of the kth event. Then 


T, Tə = Tı, T; S, Th, 


are independent random variables, each exponentially distributed with param- 
eter À. Let Y, be a value associated with the kth event and suppose Y,, Y,,...are 
independent and identically distributed, independent of the Poisson process, 
and follow the common distribution function F. 

We plot the points of the coordinate pairs (T,, Yı), (To, Y2), .... Fig. 3 
illustrates the plotting. 


T, T3 Ta T, Time 


FIG 3 The point process associated with a compound Poisson process. 


9. COMPOUND POISSON PROCESSES 437 


For each set A in the positive half plane, let K(A) be the number of plotted 
points that fall in A. The process {K(A), A = (—œ, +00) x (0, œ)} is a point 
process (see Chapter 1) called the point process associated with a compound 
Poisson process. 


Theorem 9.3. Let A, A,,..., Am be subsets of the positive half plane. Then 


(i) K(A) has a Poisson distribution with mean 


MA) = |È 2dr) ae 
and 


(ii) If A,,...,A,, are disjoint, then K(A,),..., K(A,,) are independent random 
variables. 


Proof. We will prove this only when A, A,,..., Am are rectangles. The general 
case may be obtained by approximating an arbitrary set with a grid of rectangles. 
First suppose 


A =(s,t] x (x, y]. 
Then 


KA) = A — SFO) — F(x)]. 


Let p = F(y) — F(x), and let N be the number of time points T; falling in (s, t]. 
Then N has the Poisson distribution with mean A(t — s) and, conditioned on 
N =n, K(A) has a binomial distribution with parameters n and p. Thus for 
0 < |u| < 1, 


E[u ®] = Y Efu ®|N = n]Pr{N =n} 
n=0 
[Ar = s'e =e 
n! 


(1 — p + pu)" 


I 
ims 


= exp[—A(t — s)p(1 — u)]. 


This is the generating function of a Poisson random variable having mean 
(A) = A(t — s)p, which completes the proof of part (i). 

For part (ii) let us consider the case in which the rectangles A,,..., Am com- 
pletely partition the space (0, 1] x (— œ, +00). This is illustrated in Fig. 4. 
By appending rectangles, if necessary, the general case can be subsumed in this 
one. We condition on N(t) = n. Then, according to Theorem 2.3 of Chapter 4, 
the times 7,,..., Ta have the distribution of the ordered values of n independent 
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Value 


FIG. 4 


observations, uniformly distributed on [0, t]. The probability that any single 
such observation would result in a point in A; = (s;, ti] x (xi, yi] is 


_ @ — FO) — F@)] 1 
t At 


Since the uniform observations are independent and the values are independent, 
the n points so scattered will result in a multinomial distribution for 


K(A,), Reger K(Am) 


with parameters N = n, pi, ..., Pm. Thus the joint generating function for 
(K(A,), ..., K(A,,)), under the condition N = n, is the multinomial generating 
function 


E[u% M+ uk(Am)| N = n] = (piu: + i + Pmum?”, 


and 


u(A;). 


i 


a (At)'e 7 
E[u” Xo xX uk] = È Piu Sea PmUm). l! 
= exp[ —At(1 — piui — «> — Pmum)] 


H exp[—åtp(1 — u:)] 


Ii exp[— (A) — u)]. 
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Thus, the joint generating function of K(A,),..., K(A,,) factors into the ap- 
propriate marginal generating functions and the proof of independence is 
complete. W 


Example 3. It was noted that the logarithm of mean tensile strength of brittle 
fibers, such as boron filaments, in general varied linearly with the length of the 
filament, but that this relation did not hold for short filaments. It was suspected 
that the breakdown in the log-linear relation might be due to testing or mea- 
surement problems, rather than be an inherent property of short filaments. 
Evidence supporting this was the observation that short filaments would break 
in the test clamps, rather than between them as desired, more often than would 
long filaments. Some means of correcting observed mean strengths to account 
for filaments breaking in, rather than between, the clamps was desired. It was 
decided to compute the ratio between the actual mean strength and an ideal 
mean strength, obtained under the assumption of no stress in the clamps, as a 
correction factor. 

Since the molecular bonding strength is several orders of magnitude higher 
than generally observed strengths, it was felt that failure typically was caused by 


Theoretical maximum 
molecular bonding 
strength = y* 


h 
Strength 


| 
| 
| 
I 
| 
| 
l 
l 
l 
l 
l 
x 


Ideal stress 


Actual stress 


B a 


<- [> Distance along filament 


hia, 4 
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flaws. There are a number of different types of flaws, both internal flaws such as 
voids, inclusions, and weak grain boundaries, and external or surface flaws, 
such as notches and cracks, which cause stress concentrations. Let us suppose 
that flaws occur independently in a Poisson manner along the length of the 
filament. We let Y, be the reduction in strength caused by the kth flaw and 
suppose Y, has the probability density function f(y), y > 0. We have plotted 
this in Fig. 5. The flaws reduce the strength. Opposing the strength is the stress 
in the filament. Ideally, the stress should be constant along the filament between 
the clamp faces, and zero within the clamp. In practice the stress tapers off to 
zero over some positive length in the clamp. As a first approximation it is 
reasonable to assume that the stress decreases linearly. Let | be the length of 
the clamp and t the distance between the clamps, called the gauge length, as 
illustrated in Fig. 5. 

The filament holds as long as the stress has not exceeded the strength as 
determined by the weakest flaw. That is, the filament will support a stress of y 
as long as no flaw points fall in the stress trapezoid of Figure 5. The number of 
points in this trapeziod has a Poisson distribution with mean u(B) + 2u(A). In 
particular, no points fall there with probability 


e` THB) + 2u(A)], 
If we let S be the strength of the filament, then 


Pr{S > y} = e7 240- mB), 


We compute 


l xtjl 
mA) = Í l COS as as 


= fit = s) jo = Sas 
0 y 


M(B) = ar{ — F(y)], 


and 


where F(y) = IN f(s) ds. Finally, the mean strength of the filament is 


E[S] = f Pris > y} dy 


Š l opf -at — F] -2 Í “(1 x s) fot 24) as} a 
o 0 y 


For an ideal filament we use the same expression but with / = 0. 


PROBLEMS am 


Elementary Problems 


1. It is assumed that the sizes of firms are quantized and that in each time increment h 
the unit of a firm has a probability h + o(h) of increasing one unit, independently of the 
previous history of growth and its current size. Given that a firm is initially of size 1, what is 
the probability that during the elapsed time t the firm grows to total size n? 


Hint: This is a Yule process; see Section 1 of Chapter 4. 
Solution: e™'(1 —e‘y7l. 


2. N bacteria are spread independently with uniform distribution on a microscope slide 
of area A. An arbitrary region of area a is selected for observation. Determine the prob- 
ability of k bacteria within the region of area a. 


© -J 


3. Show that as N —> œ and a—0 such that (a/A)N > c (0 < c < œ) then p(k) > 
e°ck/k!. 


Solution: 


4. Let {X,(t), t > 0}7_, be independent Poisson processes with the same parameter J. 
Find the distribution of the time until at least one event has occurred in every process. 


Solution: P{T <t}=(1-—e7*). 


5. An experiment has N possible outcomes each occurring with probability p;, )’, 
p; = 1. Let T be the number of trials necessary for all different outcomes to have occurred. 
Show that 


E(T) = [edi a- en], 
0 i=1 


Hint: Consider performing the experiment at the event times of a Poisson process with 
parameter À = 1. 


Problems 


1. Consider a two-dimensional Poisson process of particles in the plane with intensity 
parameter v. Determine the distribution F(x) of the distance between a particle and its 
nearest neighbor. Compute the mean distance. 


Answer: F(x) = | — exp(—vax?); E[D] = 1/(2,/9). 
2. Solve the preceding problem for a three-dimensional Poisson process. 
Answer: F(x) = | — exp(— vax"): ELD] = 1G)/B6vz)!”. 


3. Suppose a device is exposed to one of k possible environments Ey, E2... Ep which can 
occur with respective probabilities ep ey. 06.5 ca (She, cy = 1). In cach environment 
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dangerous peaks occur according to a Poisson process with parameter /,, j = 1, 2,...,k. 
Within the environment E, the conditional probability that the device fails, given that a peak 
occurs, is p;. Find the probability that the device fails within a given length of time t. 


Answer: Pr{T < t} = 1— Y%_, c; exp(—A,p;t). 


4. A group of n engineers is engaged in a project. The time before an engineer makes an 
error has the probability distribution F(t). If an error is made it can either be a type I error 
with probability p or of type II with probability 1 — p. A type I error is so serious that if 
anybody commits it at any given time the whole project is sure to run afoul. A type II error, 
however, is so slight that the only way it can ruin the whole project is if all engineers in- 
dependently commit this error. Compute the probability that the project is still on the right 
track at time t. 


Hint: Compute the probability that exactly k engineers have made type II errors and the 
others have no error by time t. Show that this probability is 


(ta - FOF — FOF. 


Answer: [1 — pF(t)]" — [0 — p)F(O]’. 


5. Consider a circuit consisting of m subsystems in parallel, each subsystem consisting of 
n like components in series. Assume that component lives are independently and identically 
distributed according to F(t). 

Show that the probability that the circuit survives until time t is given by 1 — {1 — 


[1 — FOJ”. 


6. Consider a sequence i = 1, 2,...,m of electrical components of a complex system S. Let 
Ft) denote the distribution function of the time until failure of the ith component. Let 
1 — p; denote the conditional probability that if the ith component fails it will render the 
whole system inoperative. 

(i) A system of components is said to be semiparallel whenever the system fails only 
if all components fail or a single component fails (say the ith) and with probability 1 — p; 
the system becomes inoperative. 

(ii) A system of components is said to be in series provided the system fails if a single 
component fails. 

The reliability of S at time t is defined to be the probability that the whole system S is 
operative. 

(1) Suppose F(t) = F(t), pi = p (i = 1, 2,..., m), and consider a semiparallel system. 
Prove that 

reliability at time t = [1 — F(t) + pF(t)]" — [pF(O]”. 


(2) Let the conditions of (1) hold. Suppose F(t) = | — e~” (A > 0). Prove 
m—k 


Sop 
&, k 


Hint for (2): Let u,, = expected time until failure for m components in a semiparallel 
system. Deduce the recursion formula 


expected time until failure = 


Nee 


1 
Um = ma + PUm-1- 
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7. Consider a collection of circles in the plane whose centers are distributed according to 
a spatial Poisson process with parameter |A|, where |A| denotes the area of the set A. (In 
particular, the number of centers ¢(A) in the set A follows the distribution law Pr{é(A) = k} 
= e4I[(A(|A|)*/k!].) The radius of each circle is assumed to be a random variable 
independent of the location of the center of the circle with density function f(r) and finite 
second moment. Show that the family of random variables C(r) the number of circles 
which cover the origin and have centers at a distance less than r from the origin, determines a 
variable time Poisson process where the time variable is now taken to be the distance r (cf. 
Elementary Problem 12, Chapter 4). 


Hint: Prove that an event occurring between r and r + dr (i.e., there is a circle of center in 
the ring of radius r to r + dr which covers the origin) has probability A2ar dr fp f(p) dp 
+ o(dr) and events occurring over disjoint intervals constitute independent r.v’s. Show that 
C(r) is a variable time (inhomogeneous) Poisson process with parameter 


Ar) = 2nAr [rm dp. 


8. Show that the number of circles which cover the origin is a Poisson random variable 
with parameter A f8 mr?f(r) dr. 


9. Consider sphere in three-dimensional space with centers distributed according to a 
Poisson distribution with parameter 4| A| where |A| now represents the volume of the set 
A. If the radii of all spheres are distributed according to F(r) with density f (r) and finite third 
moment, show that the number of spheres which cover a point t is a Poisson random variable 
with parameter $Az |? r°f(r) dr. 


10. Suppose there is a reaction between the bacteria whenever two or more bacteria 
have their centers less than a distance r apart. Find the distribution function for the number 
of reactions in the area A, valid in the limit asr > 0, N > œ with nr? N?/A > 1,0 <A < œ. 


Answer: p(l) = e*(A/I). 

11. Suppose new mutant types arise according to a Poisson process of parameter v. The 
population generated by each new mutant fluctuates in accordance with the laws of a birth 
and death process of infinitesimal parameters 4, = nå and uw, = nu, where u > A. Distinct 
mutants create independent lines of growth. (Recall that since u > A each line becomes 


extinct in finite time with probability 1; see Section 7 of Chapter 4.) Show that the number 
of mutant lines L, existing at time t has a Poisson distribution. 


Answer: Let Q(€) be the distribution function of the time to extinction of a linear growth 
birth and death process (A, = nA, u, = nj) starting with one individual. The parameter of 
the Poisson distribution is 


v ft = xJ ač. 
0 


12 (continuation of Problem 11). Determine the limiting distribution of L, as t > ©. 


Answer: Poisson distribution of parameter 


et À 
rf Ht aenyade HE togli -I 
Jo mg Al. Hj. 
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13. Suppose particles arrive in a pattern of ą Poisson process with parameter À. On 
arrival each particle enters one ofr (> 1) states with probabilities p,, p.,..., p,, respectively. 
After arrival a specific particle undergoes changes of its state according to the laws of a time 
homogeneous Markov process with transition probabilities P;;(t) = Pr{state at time t is j| 
state at time 0 was i}, i,j = 1,...,r. Let Y(t) = {X,(0,..., X(t); t > 0} be a vector-valued 
stochastic process where X;(t) is the number of particles in state i at time t. Prove that 
{Y(0), t > 0} is a time-homogeneous Markov process. (The states could be interpreted as 
different stages of an illness.) 


14 (continuation of Problem 13). Consider a single particle of type i at time 0. Let 


6,(t) = 1 if at time t this particle is in state j 
=0 otherwise. 


Prove the generating function relation 
E 
E[t 282 sky zor] = 5X 2; Pit). 
j=l 


15 (continuation of Problem 14). Establish that the probability generating function of 
{X1(0), 248-9 XO} is 


Øi 2252-52951) = EIOZ.. 2%] 


z F t 
S exp] a5 Di X (z, — 1) fro a| 
i=1 k=1 0 

Hint (cf. the analysis of Section 3): 

1. Condition on the number of particles arriving up to time t. 

2. Use the fact that in a Poisson process, given the number of events up to time t, the 
times at which the events occur are independent and identically distributed uniformly in 
[0. t]. 


3. Use the fact that individual particles act independently of each other. 


16. Consider the following two-type population growth model. The two types of particles 
are either of normal or mutant form. A normal type lives a random length of time with 
exponential distribution with mean 47! and then gives birth to two normal types with 
probability p or to one normal and one mutant type with probability q = 1 — p. Each 
normal offspring acts the same as the parent. A mutant type lives a random length of time 
following an exponential distribution with mean „~+ and then gives birth to exactly two 
mutant types which behave the same way as their parent. Assume that all types act inde- 
pendently. We start with one normal type. Let {X(t), Y(t)} denote the number of normal 
and mutant types in the system at time t. Find the probability generating functions w ,(z, t) 
= E[z*] and w,(z, t) = E[z’]. $ 


Hint: (a) Show that {X(t} is a Yule process with parameter Ap. 
(b) Derive the following integral equation for Y = Y, by conditioning on the time of 
the first split, 


t 
pizt) =e Sarie — T) + quant — ty(z,t — t)] dt, 
0 
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where 


g(z, t) = ze “[1 — 21 — e7“)]7 +. 


Make a change of variables, t — t = u, in the integral and then differentiate with respect to 
t to get 


ow 
i: —Ab + Ab(ph + q9). 
Let o = 1/4. Then 
6p : 
— = Ap — Ap — dag. 
Ai p Pp 4gp. 
Solve this to show that 
F(z, t) 
wv 


1 — Ap f} F(u; t) dw’ 


where F(z, t) = e7 ¥[1 — 2(1 — e~“)]~*™. In the special case 2 = p, show that 


eee ee eee 

y Z 

17. Customers arrive at a counter according to a Poisson process having rate A. Each 
customer pays a dollar at the time of his arrival. If a customer arrives at time t, his dollar is 
discounted to a present value of e~ *". Compute the mean total discounted payments of all 
customers arriving in the first T time units of operation. That is, if t; is the time of arrival 


of the ith customer, and N(T) is the number of customers arriving in the first T units of 
time, then find E) A) exp(— Bt,)]. 


18. Let {N(A), A c R?} be a homogenous Poisson point process in the plane having 
parameter A. That is, (i) if A,;, Az,..., A, are disjoint subsets of the plane, then N(A,), 
N(A)), ..., N(A,) are independent random variables and (ii) if A is a subset of the plane, 
then N(A) has a Poisson distribution with mean 4| A| = 2 x area of A. 

(a) Let A, and A, be arbitrary subsets of the plane. Compute K(A,, A2) = covariance 
of N(A,), N(A,). Note: If X and Y are random variables having finite second moments, 
the covariance of X,Y is given by E[(X — E[X])(Y — ELY])]. 

(b) Show that for arbitrary sets A,, A2,..., A, and arbitrary real numbers «,,02,..., & 
that 


A 
$ aa K(A, A;) > 0. 
ijn : 


19, The following model is proposed to describe the statistical distribution of the times of 
lightning discharges in a localized storm. Very briefly, the model proposes a succession of 
static electric charge buildups relieved by lightning discharges. 
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More precisely, we suppose that immediately after a discharge the static charge in the 
atmosphere is zero. The charge then builds up in a deterministic manner in which r(t) is the 
charge in the atmosphere t time units after the previous lightning flash. 

A triggering event is required to initiate a lightning flash. Such events occur according 
to a Poisson process with parameter A. A given triggering event may or may not cause the 
flash. Whether it does or does not is a random event whose probability depends on the 
current charge level r(t). Let us suppose that, given a charge level r(t) = r, a triggering event, 
should one occur, will cause a flash with probability p(r). Then 0 < p(r) < 1 and it is reason- 
able to assume that p(r) is an increasing function of r. It is also reasonable to assume that 
r(t) is an increasing function of t. 


Charge at time t 


l 
l 
l 
l 
| 
| 
| 
| | 
| | 
j 
i i ! l Time, t 
l { 
Triggering l l | | 
events Heek — x J- kae 
(Poisson, A) l | l 
™ % Bo m | l 
l 
| 
| l l 
! ] | 
| 1 1 
! | l 
| | | 
| l l 
| I | 
s k + + 


Times of ia l | 
flashes | Eai T3 Ts 


The triggering event at time t, causes a flash with probability p[r(t,)] and no flash with 
probability 1 — p[r(t,)]. A flash did not occur for t, pictured here but did occur at t3. 


(a) Assume that at time t = 0, r(0) = 0, and let T be the random time of the first 
flash. Find the distribution function 


F(t) = Pr{T < t}. 


(b) Immediately after the first flash, say at time T + 0, again the charge r(T + 0) = 0, 
and the process repeats, generating a sequence, T = T;, Ta, T;,... of random times between 
flashes, independent and each with distribution function F(t). In terms of F(t), what is the 
mean number of flashes observed up to time to? 
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20. Let {X(t), t > 0} be a stochastic process having stationary independent increments 
and finite second moments E[X(t)?] < œ. Suppose X(0) = 0. We showed in Chapter 1 
that the mean value and variance function for such a process were linear, say, 


E[X(t)] = pt 
and 
E[{X(t) — ut}?] = ot. 


Fix h > 0 and consider the process Z(t) = X(t + h) — X(t). In terms of u and 0°, 
compute the mean value function, variance function and covariance function for the Z 
process. That is, find 


m(t) = E[Z(t)] 
V(t) = EHZ) — m(t)}7] 


and 


' 


K(s, t) = E[{Z(s) ~ m(s)}{Z(D) — m]. 


21. Let 0= To < Tı < T+- be times of successive events of a Poisson process with 
parameter A. Suppose each event (independently of others) is recorded with probability p 
and erased with probability 1 — p. Show that the random variables X,(t) and X,(t) of 
recorded and erased events respectively constitute independent Poisson processes with 
parameters Ap and A(1 — p). 


Hint: Foranyt > 0, let X,(t) and X(t) denote the number of recorded and erased events 
respectively in (0, t). Then their joint probability generating function 


SEn 22) = E1 270] 
= E[E[z%¥: "2% X(t) + X(t)]] (by conditioning on X,(t) + X,(t)) 
= El(pz, + qz2) OV] 
(since given X ,(t) + X(t), X,(t) has a binomial distribution) 
= eit(pzitaz2—1) — phptzi— 1) pdqt(z2— 1). 


22. In the above problem, let each event be classified, independently of others, into k 
categories with probabilities p; i = 1, 2,..., k, ae pi = 1. Let X(t) be the number of 
events that happen in (0, t) and belong to the ith category i = 1, 2,..., k. Show that {X (t), 
X ,(t),..., X,(t)} for i = 1, 2,..., k are independent Poisson processes, with parameters 
respectively Ap;, i = 1, 2,...,k. 


23. Consider a Poisson process in (0, œ) with parameter A. Suppose an event that occurs 
at time / is classified, independently of other events, into one of k categories with probability 
pt), fori = 1,2,...,k, vi. 1 pdt) = 1. Assume each p;(t) is continuous in t. Let X(t) denote 
the number of events during the time duration (0, ¢) belonging to the ith category. Show that 
for cach i, {X(0, t 2 0} is a variable time (ie, inhomogencous) Poisson process (cf. 
Problem 12 of Chapter 4). 
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Hint: Show that Pr{an event of category i occurs in (t, t + h)} = Ap(t)h + o(h) and the 
event in braces is independent of the values of X(t) for t < t. 


24. In Problem 23 show that the processes {X((t), t > 0} i = 1,2,..., k are independent. 


Hint: Let (a;, b), i= 1, 2,..., k be any k intervals in [0 00). Define Y, = X,(b;) — X (ai). 
Show that Y,, ¥,,..., Y, are independent r.v.’s. To do this, condition on the number N(T) 
of events of all categories in (0, T) where T = max, <;<, bi- Given N(T), the instants at 
which the events occur in (0, T) constitute n independent observations from a uniform 
distribution in (0, T). Under the condition N(T) = n, (Y,, Yo,..-. Y, n — Yk, Y) has a 
multinomial distribution with associated probabilities 


1 fe 
raz | PO ds i=1,2,...,k, 


k 
Prov = L= X, Pi- 
i=1 
Therefore, 
E[z}'23? -+ 2*|N(T) = n] = (P121 + Po22 + +++ + Dem + 1 pi — ++ pi 
But N(T) is distributed as a Poisson random variable with mean AT. Hence 


E[z'! --. z£] = exphuT| © pz; — || 


i=1 
k 
= []expliz; — 1), 
i=l 
where 


b, 
A; = af pt) dt. 


The same argument applies if each interval (a;, b;) is replaced by any finite union of intervals. 


25. In Problem 23, let Ie pt) dt < œ fori = 1,2,...,r, wherer < k — 1. Prove that the 
joint distribution of (X (t), ..., X,(f) tends in the limit (t > co) to that of r independent 
Poisson variables with means respectively 4 f 0 p(t) dt < œ, i = 1, 2,...,r. 


26. Let 0 = To <1, < T3 --- be events of a Poisson process with parameter À. Let at time 
0 a particle commence executing a Brownian motion (a? = 1) with initial position 1;. We 
say the event of the Poisson process occurring at the instant q; is erased if the Brownian 
particle starting at t; has position to the left of —a (a > 0) at the instant tọ and is recorded 
otherwise. Show that the processes of recorded and erased points are independent, variable 
time (i.e., inhomogeneous) Poisson processes with parameters A{1 — ®[(t + o lol]} and 
A@((t + a)/x/tol, respectively, where ® is the standard normal distribution function. 


27. Let 0 = Tọ < T; <--- be the times of successive events of a Poisson process with 
parameter A. Let p(x, y) be a real-valued function for x > 0. Consider 7; = (t,, €(7;)) where 
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{€(t), t > 0} is a stochastic process independent of the {t;} process and such that for any n 
and any t,,t2,...,t,,, the random variables &(t,),..., č(t„) are independent. Define F(x, t) = 
Pr{(t, €(t)) < x}. For each t > 0 and r disjoint intervals (a;, b), i = 1, 2, ..., r on (— œ, 
+), let X,(t) denote the number of q; in (0, t) satisfying a; < t; < b;. Show that {X (t), 
t > 0} for i = 1, 2,..., r and r independent time varying (inhomogeneous) Poisson pro- 
cesses. Show that E[X(t)] = 4 fo [F(b;, u) — F(a;, u)] du, assuming, of course, that for each 
x, F(x, u) is an integrable function of u over any finite interval. 


Hint: Use the results and methods of Problems 23 and 24. 


28. In Problem 27 assume Se F(x, u) du < œ for each x. Show that the distribution func- 
tion of X,(t) tends to a Poisson random variable as t > œ. Show also that {N(a), — 00 < 
a < co} where N(a,) — N(a,) is the number of 77s in (a,, a2) determines a process with 
independent but not necessarily stationary increments. 


29. In Problem 23 assume there exists a function h(x) such that for any x, < x2 
œ o X2 
) F(x2, u) du — 1 F(x, u) du = f h(t) dt. 
0’ (6) x1 


Show that the process {N(a), —00 < a < œ} defined in Problem 28 is a Poisson process 
not necessarily time homogeneous with parameter Ah(t). 


30. In Problem 29 assume h(t) = u. Show that {N(a), — œ < a < œ}isa one-dimensional 
spatial Poisson process with parameter Ay. 


31. Let the process {€(t), t > 0} of Problem 27 be such that for any n and any t,, t2,..., 
ta > 0, E(t,),.--, E(t,) are positive independent r.v.’s and possess a common distribution 
function G(x). Specialize the results of Problems 27-30 for the following cases: 

(a) (x, y) = xy, 

(b) øx, y)=x+ y. 

In (a) assume 1/w = f? dG(v)/v < œ. 


32. Deduce from 31b above that for an infinite queue (M|G|0o), see Problem 10 of 
Chapter 18) with a homogeneous Poisson input and arbitrary service time distribution 
G(x) that the output process {U(t), t > 0} (i.e., the number of customers served by time t) 
is a time-dependent Poisson process provided at t = 0 there are no customers in the system. 
The parameter is AG(t). 


33. Suppose cars enter a highway at instants 0 = Tọ < T; < T, <--- which form a 
Poisson process with parameter 4. The highway is semi-infinite and cars enter it from one 
end and move in the same positive direction. The car entering at time t, picks a velocity v, 
and travels with constant velocity v,. We assume the v, are independent positive random 
variables with common distribution function F(x). Show that in the stationary case, i.e., 
after an infinite length of time, the spatial distribution of cars along the highway constitute 
a homogeneous Poisson process, provided 1/w = fẹ dF(v)/v < œ. Find the parameter. 


Hint: Let (ay, by) and (ag, 63) be two disjoint intervals on (0, œ). Consider the time axis 
as extending from the present to the infinite past and measure the position of the cars now 
from the entrance to the highway. The instants at which the cars entered the highway form 
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a Poisson process in this time axis. Any 7; will be of category 1 ifa; < Tiv; < b,, of category 
2 if a, < 1;v; < b2, and of category 3 otherwise. Then in the notation of Problem 24 


Highway 
—— 


T2 4 To 


p(t) = F(b,/t) — F(a,/t), 
p2(t) = F(b3/t) — F(a2/t), 
pst) = 1 — pi) — pol). 
Now use the results of Problems 23-25. Note that | F(x/t) dt = x/w, where w`! = 


JE dF()/v. A similar argument applies for any finite number (a;, b;) of nonoverlapping 
intervals. 


NOTES 


Material related to this chapter is contained in Bartlett [1] and Harris [2]. 

Further discussion of compounding stochastic processes with emphasis on applications 
is contained in Bartlett [3]. 

Elegant presentations of topics from the rapidly developing field of coupled and inter- 
acting stochastic processes are introduced in Preston [4] and Griffeath [5]. 

The pervasive occurrence of point processes in studies of traffic flow, in descriptions 
of the distributions of astronomical bodies, in systematics for biological species, in reference 
to geological formations and in reliability systems is amply illustrated in Lewis [6]. 
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Chapter 17 


FLUCTUATION THEORY OF PARTIAL 
SUMS OF INDEPENDENT 
IDENTICALLY DISTRIBUTED 
RANDOM VARIABLES 


1: The Stochastic Process of Partial Sums 


Consider independent, identically distributed real-valued random variables 
Xı, X2,... (not necessarily positive) and define the partial sums 


SoS Xp +X Hee +X, ne= 4,2) ey 
(1.1) 


So =0 (by convention). 


These partial sums S, may be graphed against n and the points (n, S„) connected 
by straight lines as in Fig. 1. 

The process of partial sums (1.1) plays a fundamental role in many diverse 
areas of applications of stochastic processes. Thus, the analysis of embedded 
recurrent events in Markov processes, renewal phenomena (see Chapter 5), 
queueing and dam systems, calculation of risk and ruin probabilities, etc. all 
revolve on discerning properties of certain random functionals connected to 
the process (1.1). We can succinctly characterize fluctuation theory as the study 
of random variables of the form f (Sj, S1, S2,- - . , Sp) defined on the partial sums 
(1.1). It is worth highlighting several random functionals of importance in the 
theory and its applications. Set 


M,, = max(0, Si, 82,...,8,), m, = min(0, S,,S2,...,S,). (1.2) 


The random variables M, keep track of the maximum of the partial sums 
evolving over time, When M, = Sa. then a maximum point of the realization 


ant 
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FIG. 1 


is manifested at the nth occurring partial sum, and in this case, 


S;< Sa  j=0,1,2,...,n— 1. 
Define alsof 


P} = number of S; > 0, i= 1,2,...,n, 


P,, = number of S; > 0, on ees A 


(1.3) 


Q, = number of S; < 0, i=1,2,...,n, 
Q, = number of S; < 0, i=1,2,...,n. 


The number P} /n evaluates the fraction of time (in the first n time units) that 
the process {S,} spends on the positive axis. This quantity is commonly referred 
to as the occupation time random variable of the positive axis. The sojourn time 
or occupation time of certain states are basic random variables which often 
underlie the structure of the process. This is of special relevance for the analyses 
of diffusion processes on the line (see Chapter 15). Sometimes we write P,(X), 
X = (X,,X>,...,X,), instead of P, to indicate that P, is evaluated for the 
sequence of partial sums 0, X,, Xi + X2, Xi + X2 + X3,...,X, +X, 
+-+- + X,. Similarly we will sometimes write P} (X), Q,(X), M,(X), etc. 

Of importance also are the positions of the first and last maximum (mini- 
mum) terms among the collection of partial sums {S;}5. We say that the partial 
sums So, S1, S2,..., 5, exhibit their last maximum at position k (0 < k < n) if 


S, = S; forall j=0,1,...,k-—1 


and . 
Sk > S; forall j=k+1,k+2,...,n. 


Analogously the index of the first maximum is that value / satisfying 
S, > Si, i=0,1,2,...,/—1, 
S,= S;, i=l+1,...,n. 


+ It is more traditional to use the notation N, in place of P,,. 
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The first and last minima are defined in a parallel fashion or equivalently by 
observing that the position k is a first (last) minimum for the partial sums So, 
S,,..., Sp if and only if k is a first (last) maximum for the corresponding partial 
sums of the sequence { — X;,}9. It is now convenient to introduce some further 
notation. Define the random variables 


Lin = position (i.e., the index) of the last maximum among {So, S;,..., S,}, 


Lio = position of the first maximum among {Sp, S;,..., Snb 


K,o = position of the first minimum among {S0, S,,..., Sy}, a”) 


K,n = position of the last minimum among {Spy, S;,..., S,}. 


There are a number of key relationships among the random variables of 
(1.3) and (1.4) emanating from certain symmetry and combinatoric considera- 
tions. In the next section we develop a fundamental equivalence principle, and 
the main identities for the distributions of M,, P}, etc. are delineated in the 
succeeding sections. Several applications of the identities are set forth in Secs. 3 
and 6. 

One of the principal objectives of fluctuation theory is to ascertain the 
distribution of M, in terms of the convolution distribution functions F®, 
j= 1,2,...,n, of the random variables S,, respectively. The task will be carried 
out in Theorems 5.2 and 5.3 of Sec. 5. 


2: An Equivalence Principle 


Distribution relationships for the random variables of (1.3) and (1.4) are 
developed first for a very special sample space designated as # and later the 
identities are validated in the general framework of an arbitrary sequence of 
lid. (independent identically distributed) random variables. Specify y = 
(Yis Y2» -- -> Yn) as a fixed ordered n-tuple of real numbers. Let & = {ø} consist 
of the collection of all permutations o = (64, 62,..., On) carrying (1, 2,..., n) 
into itself (the permutation group on n elements). Construct Z to be comprised 
of the n! points 


B = {oy = (Voy Yous ++ +> Yon) O traversing È}. (2.1) 
To illustrate, where n = 3 the set Z consists of the triplets 


Yi Yas Yas is Vas Y2) W25 Vis Y3) (Y25 Yas Vids Vas Yn Y2) (Vas Y2; Vidi: 


Assign to cach member of # the same probability I/n!, thereby converting # 
into a probability space. Each ay e # furnishes a possible value of the random 
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vector X = (X,, X2,..., Xn), where X,(oy) = Yop Assuming y;,..., Yn to be 
distinct values, X, is uniformly distributed for any k = 1, 2,...,n; 


Pr{X,(cy) = y} = z i,k = 1,2,...,n, (2.2) 
since all permutations are equally likely. 

Patently, X,,..., X, are not independent random variables. It is well worth 
noting, however, that they are exchangeable; that is, X,,..., X„ have the same 
joint distribution as any permutation X,,,..., Xin 

The quantities S,(cy), P,(oy), Li,(ay), etc. can be viewed as particular values 
of the random variables S,(X), P,(X), Lan(X), etc. For example, again assuming 
V1o---> Yn to be distinct, n! Pr{P,(X) = k} is just the number of permutations o 
with the property that exactly k among the partial sums 


S,(cy) = Yop S,(cy) = Yo, of Yor 
S3(cy) = Yo, + Yoz na Vox sey S,(oy) = Yoi DE Von? 


(the partial sums induced by the point oy = (Ye, Yoz»++-» Yo„)) are nonnegative. 
This may be written 


n! Pr{P,(X) = k} = ny Tip, =4(FY), (2.3) 
where the summation runs over all n! permutations of (1, 2,..., n) and 


1 if P,(cy)=k, 


; E 2.4 
Tip, =«(Y) ‘5 otherwise; 7 


i.e., Itp, =4(-) is the indicator function of the set {P, = k}. 


Lemma 2.1 (The Equivalence Principle for the Special Sample Space Z). Let 
Y = (Yis Y2>--+5 Yn) be fixed and define the random variables X,, X,,..., Xn on 
B by the prescriptions X (oy) = Yep k = 1,2,...,n. Then 


Pr{P, = k} = Pr{L,, = k}, (2.5) 
Pr{P} =k} = Pr{Ly = k}, (2.6) 
Pr{Q, = k} = Pr{Kyo = k}, f (2.7) 
Pr{Q, = k} = Pr{Ky, = k}, (2.8) 
Pr{Ljo = k} = Pr{K,, = n — k}, (2.9) 
Pr{Lan = k} = Pr{K,9 =n — k}, (2.10) 


where P, means P,(X), P} abbreviates P(X), ete. 
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Proof. We first prove (2.9) and (2.10). Let t be the special permutation that 
reverses the order of (1, 2, ..., n), i.e., that transforms it into (n,n — 1,..., 1). 
If x = (x1, X2,..., Xn) then tx = (Xn, X,—-1,.--, X1). Now observe that 


S,(tx) = S,(x) — Sn- (x). (2.11) 
Further, by definition of K,,,(tx), we have 
SK aalexltX) < S;(1x) forall j= K,,(tx) + 1,...,n 
and 
SK palextX) < Sj(tx) forall j =0,1,...,K,,(tx) — 1. 

By means of (2.11) these inequalities may be recast as 

Sn- Kanol) > Sx) forall j=0,1,...,n — K,,(tx) —.1 
and 

Sn—Kyn(ex(X) = S;(x) forall j =n — K,,(tx) + 1,...,n. 


These relations determine that n — K,,,(tx) is the index value delimiting L,,o(x). 
Thus 


Lyo(X) = n — Kyp(tX). (2.12) 


In a similar fashion, from the definitions of Lan and K,. and with the aid of 
(2.11), we deduce 


L(x) = n — Kyo(tx). (2.13) 


Since (2.12) and (2.13) persist for any n-tuple x e Z, these identities apply 
in particular for x = y. 

Now, as both events, X = yand X = ty have the same probability, namely, 
I/n!, (2.12) and (2.13) imply equations (2.9) and (2.10), respectively, of the 
theorem. 

The proofs of (2.5)-(2.8) proceed by induction. For n = 1 these assertions 
arc immediate. Assume that (2.5)-(2.8) prevail for any fixed (n — 1)-tuple of real 
numbers. We advance the induction by demonstrating these equations are 
maintained for any n-tuple y = (y,,..-, Yn) of size n. To this end, it is convenient 
to distinguish three separate cases. 

Case 1: S (yY) = Yı +++: + y <0. Then the random variables P,(X), 
Lio(X), La (X), and P(X) cannot take the value n. Let us examine 
Pr{P, = k|X, = y,}. Under the conditioning the sample space mimics the 
original sample space but deals with a symmetric set of (n — 1)-tuples where each 
point carries probability 1⁄7 — 1)! The interpretation of Prò Lu, = k|X, = yy} 
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is similar. We can accordingly invoke the induction hypothesis for 
0<k <n —1, obtaining ° 


Pr{P, = k|X, = yj} = Pr{Lin = k|X,, = Vj} 
and similarly 
Pr{P, = k|X, = yj} = Pr{Lio = k|X, 55 yj} 


By the law of total probabilities, these equations immediately imply (2.5) and 
(2.6) for n-tuples. To establish (2.7) and (2.8), observe from the definitions that 
Pr{Q, = k} = Pr{P, = n — k} and Pr{Q, = k} = Pr{P; =n — k} and then 
we use (2.5), (2.6), (2.9), and (2.10). This gives 
Pr{Q, = k} = Pr{P, = n — k} = Pr{L,, = n — k} = Pr{Kyo = k}, 
(2.14) 
Pr{Q, = k} = Pr{P} =n — k} = Pr{L,) =n — k} = Pr{K,, = kt}. 


Case 2: S) = yi tHe ty > 0. Now, Q, (X), 0,(X), Kyo, and Knn 
cannot take the value n. Applying the induction assumption, we have 


Pr{Q, = k| X, = yj} = Pr{Kyo = k| X, = yj}, 
Pr{Q, = k|X, = yj} = Pr{Ky, = k|X, = yj}, 
Therefore (2.7) and (2.8) follow by the law of total probabilities. Now, with 
(2.7)-(2.10) already proved, we obtain 
Pr{P, = k} = Pr{Q, =n — k} = Pr{K,. =n — k} = Pr{L,, = k}, 
(2.15) 
Pr{P,; = k} = Pr{Q, = n — k} = Pr{K,,, =n — k} = Pr{Lyo = k}. 


This proves (2.5) and (2.6). 

Case 3: S,(y)=y, +++: +), =0. Here, P(X), Q, (X), Lyo(X), and 
K,o(X) cannot take the value n. Thus induction proves Eqs. (2.6) and (2.7) just 
as it did in Cases 1 and 2, respectively. Then (2.14) implies (2.8) and (2.15) 
entails (2.5). This completes the proof of Lemma 2.1. E 


for all j. 


Let g(x,, X2,..-,X,) denote a function symmetric in the variables 
(Xis X2,-.-,X,), which in formal language asserts the equation 
g(x) = g(ox) 


for every permutation o € È. 

Lemma 2.1 dealt with a fixed n-tuple and the sample space generated by the 
permutations thereof. The following theorem extends the pertinent identities 
to general n-vectors composed from independent identically distributed random 
variables (i.i.d. r.v.’s). 


2. AN EQUIVALENCE PRINCIPLE 457 


Theorem 2.1 (The Equivalence Principle). Let X,,...,X, be independent 
identically distributed random variables and let g,(x) be a symmetric function of 
x = (Xis ---, Xp). Then for any k, O <k <n 


E[gn; Pa = k] = E[gn; Lin = k], 
Elgn; Pr = K] = E[gn; Lao = K], 
E[gn; Qn = k] = ElGn; Kno = K], 
E[9n3 Qn = k] = ElGn3 Kun = K], 
Egna; Lao = k] = Elgn3 Kun = n — k], 
E[gn3 Lan = k] = E[gn; Kno = n — K], 


Note. The above expectations mean that they are taken not over the original 
sample space but only over the indicated subset. Specifically, if € is any event, 
then 


(2.16) 


Elgn3 6] = i: g(x) dE 


where F(x) is the distribution of the random vector X = (X,,..., X,), and the 
expectation of g,(x) is extended only over those realizations of X which belong 
to é. ` 


Proof. Because the sequel makes primary use only of the first equation of 
(2.16), we present the proof of this identity in full detail. The other equations are 
proved completely analogously. 

Consider 


Elgan; Py = k] = Í 


{Pn(x) =k} 


gy(x) dF(x) = | aul) I p,m (x) dF), 


where the notation of (2.4) is employed. Make the change of variable x > ox for 
some prescribed permutation o. Since X is composed from independent 
identically distributed random variables, it is clear that F(ox) = F(x) and 
9n(0X) = g,(x) because by stipulation g,(x) is symmetric. 

Therefore, we have 


Elg,; P, = k] = | gl) yp, (0%) dF(x) 


for every permutation o. Averaging with respect to o €e = (running over the set 
of all permutations) leads to the equation i 


l 
Elani Pa = K] = py E fohe nla dP) 
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Next, interchanging summation and integration yields 


1 
Elgn; P, = k] = fo T È Tip, = (ox) dF). (2.17) 


Now consult Eq. (2.3) and then appeal to formula (2.5) of Lemma 2.1 to validate 
the equations 


1 1 
a! > Tip,=1 (OX) = Pr{P, = k} = Pr{Lj, = k} = per » Liban =k OX)- 


The last equality ensues on the same basis as (2.3). Inserting the above formula 
into (2.17) and working backwards as leading to (2.17), we obtain 


1 
Elda: P, = K1 = fond 5 E Ltge=(0*) d(x) 
1 
= E Sauma) dC) 


m | Gud) Te <uy() F(x) = Egy; Lan = K]. 


The first identity of (2.16) is established. The other equations are proved in the 
same way. E 


Remark 2.1. The assumption that the components of X = (X,,..., X,) are 
iid. was needed in the proof of Theorem 2.1 only to the extent that the joint 
distribution function dF(X) is symmetric in the sense that F(¢X) = F(X) holds 
for every o € È (see (2.17)). A random vector X with this property is said to be 
exchangeable. 


We recover a generalized form of the identities in Lemma 2.1 by prescribing 
9,(x) = 1. For example, we have 


Pr{P,(X) = k} = Pr{Lm(X) = k} Pr{Py(X) = k} = Pr{Lyo(X) = k}, 


etc., where the probabilities are computed based on a random vector X sym- 
metrically distributed. $ 

The most important specification of g, in (2.16) to be featured in our later 
developments is the function 


gao) SR ae (2.18) 


where u is a real parameter. 
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3: Some Fundamental Identities of Fluctuation Theory and 
Direct Applications 


A remarkable series of identities will be proved in the remainder of this chapter. 
To state them we introduce the generating function 


0 


P(u; t) = X Efes”; Lon = nit” = X t" Í eitSn dF m(x), |t| < 1, 
n=0 n=0 {Lnn =n} 
(3.1) 


where Fw is the joint distribution function of the random vector X = 
(X,, X2,..., X,) and wis an arbitrary real parameter. It is customary to regard 
(3.1) as a double generating function since the characteristic function E(e™5"; 
Lin = n) is also a sort of “generating” function of the random variable S, 
appropriately restricted. The event La, = n imposes a confinement of the real- 
izations to obey the conditions {S; < S,,i= 0,1, 2,...,n}. We define parallel 
to (3.1) 


Qlu; t) = > Efe; Lun =O0]t" (here Loo = 0 by definition). (3.2) 
n=0 


The content of the next theorem embraces two typical identities of fluctua- 
tion theory. 


Theorem 3.1. For every real u and t, |t| < 1 we have 


P(u; t) = ep X © apersi; S, = aa} (3.3) 
k=1 

Ou; t) = exp} 2 r Efei5«; S, < o (3.4) 
k=1 


The right-hand sides involve simply the distributions F(x) of S,, the 
k-fold convolution of F(x), suitably restricting the range of outcomes. Explicitly, 


Efe«; S, > 0] = f eS dFM(é) 
o- 
(notice that the integration includes the contribution at 0) and 
O— 
Efe™S«; S, < 0] = f e** dFM(E) 


(the last integral does not involve the probability at zero). In principle, the 
series appearing in the exponent on the right of (3.3) and (3.4) are accessible to 
calculation. j 

The proof of Theorem 3.1 is elaborated in Section 5. For the moment, it is 
convenient to denote the right-hand side of (3.3) by f, (u, t) and that of (3.4) by 
f aN). E 
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The next identity, called Spitzer’s identity, involving the maximal process 
{M,,}, see (1.2), is basic in the study of a wide assortment of applied probability 
models, including queueing theory, branching processes, and renewal events, 
among others. 
Theorem 3.2. For |t| < 1, t and u real 
< iuM 7 4n = r iuSk < t* 
> Ej] = exp X gE ; Sx = 0] Jexp{ ¥ g Prik < 0} 

k=1 k=1 
= f(u, t) f-(0, t). (3.5) 


The proof of Theorem 3.2 appears in Section 5. 


It is useful to record the following elementary fact: 


fu, t)f-(u, t) = (3.6) 


1 — te(u) 
where (u) = E[e**] is the common characteristic function of X;. We verify 
(3.6) directly. Indeed, 
(oo) t* ie œ t* ox 
f(u, t) f-(u, t) = expt >)? q Ele; Sk = 0] Jexp y g ples Sk < 0] 
k=1 k=1 


eo) t : , 
— exp} $ Kk (Eel; Sk > 0] L E[ei#S«; Sk < o} 


[i 
N 


Epe") 


o0 k 
= eT Pel -tew since —log(1 —- x)= È a 
kai k 


_ 1 
~ 1 = tolu) 
In the remainder of this section we shall prove one key lemma and content 


ourselves with extracting a series of direct applications of the identities hitherto 
displayed. 
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Lemma 3.1. For eachk = 0,1, 2,...,n 
Pr{Liun i k} = Pr{Lig = k} Pr{Ln-k,n-k = 0}, (3.7) 


Pr{P, = k} = Pr{P, = k} Pr{P,_, = 0}. (3.8) 


Proof. We concentrate first on (3.7). (Figure 2 is helpful.) The first step comes 
from the definition 


Pr{Lj, = k} = Pr{S, > Sj, j= 0,1,...,k — 1; Sk > Sp i= k++ 1,...,n} 
= Pr{S, > S; j = 0, 1,..., k; S; -S,<0,i=k+1,...,n} 
= Pr{S, > S;,j =0,1,..., k} 
x Pr{X,4, te + X¥,;<0O1=k +1,...,n} 
(by independence) 
= Pr{L,, = k} Pr{S; < 0,j = 1,2,...,n — k} 
: (since X; are identically distributed) 


= Pr{Ly, = k} Pr{Ln-r,n-x = 0}, 


and (3.7) is validated. The identity (3.8) emanates by virtue of the equivalence 
principle equation (2.16) for g =1. E 


rag. 2 


462 17. FLUCTUATION THEORY OF SUMS 


Paraphrasing the above argument, mutatis mutandis or applying the 
reversal permutation t to (X,,..., X,) (see the proof of Lemma 2.1), we secure 
the following result: 


Lemma 3.2 
Pr{Lno = k} = Pr{Lrko = k} Pr{Ly_-x,0 = 0} 
and 
Pr{P* = k} = Pr{P = k} Pr{P7_, = 0}, k=0,1,2,...,n. (3.9) 
To dramatize the utility and agility of the identities (3.3)-(3.5) a number of 
immediate applications will be highlighted. 
A. COMPUTATION OF Pr{P, = k} IN CERTAIN CASES 
We concentrate on the special but important situation of 
Pr{S, > 0} = 4, ke 13.25 3) secs (3.10) 


For example, when {X;} are real symmetric random variables having continuous 
density function then certainly the random variables S, inherit the same property 
and (3.10) prevails. 

Setting u = 0 in the basic relation (3.3) for P(u; t) yields 


S Pril sp ep( $ É Pris, > o) (3.11) 
n=0 k=1 


Since Pr{S, > 0} = 4, by assumption, and taking account of the elementary 
formula 1 — t = exp(— œ; t*/k) the previous equation reduces to 


X Pr{Lin = nyt" = (1 — 1). (3.12) 
n=0 
The familiar binomial expansion reads 
00 eee i 
Ca) ey, ( sjene. (3.13) 
n=0 


Equating coefficients in (3.12) and (3.13) produces the evaluation 


Om 


Qn! A A 


= 22"(n1)? 22n 


Pr{Lin = 1} 
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The equivalence principle (2.16) also gives 


Pr{P, =n} = Pr{Ly, =n} = C) a (3.15) 


Following the same course, by means of (3.4), setting u = 0 in Q(u, t) with 
obvious evaluations leads to the result 


1 
22n’ 


Pr{L = 0} = Pr{P, = 0} = E (3.16) 


With the expressions (3.15) and (3.16) at hand, we can determine the complete 


distribution of P, (subject to the stipulation of (3.10)) exploiting the identity of 
(3.8). Thus, 


Pr{P, = k} = Pr{L,, = k} 
= Pr{Ly, = k} Pr{Ln-r,n-x = 0} 


_ 1 (2k) (An - 
=s(}l,—- )° (3.17) 


B. THE EXPECTATION OF THE MAXIMUM M, 


We write the identity (3.5) in the form 


bale E ap( 5 ze), 
= k=1 


n=0 


where S; = max{S,, 0}, and differentiate both sides with respect to u, and after- 
wards set u = 0 to obtain 


n=0 


ELM = exo( 5 $ PPE Č eLs¢]. (3.18) 


Now exp} K: (“4/k) = (1 — 7! = Yo t*, and the power series multipli- 


cation formula 
(Sae')(Z r) =) ( Sc} 
i=0 j=0 n=0 \k=0 


applied in (3.18) shows that 

ve) O = t 

È E[M,]t" = oy P È T ELSK] 
jo kai k 


EB (E pas}. 
A 1 


a=0 
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We equate the corresponding coefficients to conclude 
n 1 n 1 

E[M,] = } -ELS S20] ¥ = 

kai k kai k 


As n increases, the random variable M,, increases to the random variable 


EÇS; ]. 


M = sup{So, S1, S5,...} (œ values are possible) 


and in all cases the limit relation 
eed 
E[M] = X z Else S, = 0] 
k=1 


obtains. In particular, if 2 , (1/KE[S,; Sk = 0] < œ, then E[M] < œ. 
4: The Important Concept of Ladder Random Variables 


We say n is a ladder point for the partial sums S;, j = 0, 1,..., nif S, is at least 
as great as any previous partial sum, i.e., if 


S,=S; forall j=0,1,...,n. 


Let N be the number of ladder points in the series {S,}. If the summands 
have a positive mean u, by the law of large numbers S,/n > u > Oand S, > œ 
as n > œ; so in this case N = oo. If the summands have a negative mean p, 
S, > — œ; so here N is finite. But N > 1 because 0 is a ladder point. 

Define W, to be the waiting time to the first ladder point after time zero, 
with W, = œ should no such ladder point exist. Let W, be the waiting time 
from the first to the second ladder point, with W, = oo should the second not 
exist. In general, define W, as the waiting time from the (r — 1)st to the rth ladder 
point for r < N and Wy = œ when N is finite. In symbols, this reads: 


W, = T, = inf{n;n>0,S,>S; forall j=0,1,...,n}, 
T, = inf{n; n > W,,S,>S; forall j=0,1,...,n}, (4.1) 
W, = T, — T,, 
and in general, 
T. = inf{n;n > T,-,, Sa 2 S; forall j =0,1,...,n}, 
W, = T. — T,- l<r<N. 


Further, let Z, be the value of the partial sum S; at the first ladder point, Z, the 
value of the partial sum S, at the second ladder point less Z,, and in gencral Z, 
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the value of the partial sum S; at the kth ladder point less Z, + -+< + Z,_4, 
for k < N. In symbols, 


Z = Sr, Z= Sr, E Sr, 
and in general 
Z, = Sr, — Sr, r=1,2,.... (4.2) 
We may add for convenience the definitions 
To = W = 09, Zo = 0, and Ty = Wy = 0 when N< œ. 


We now state a lemma about the distribution of N and the random vectors 
(W, ’ Za). 


Lemma 4.1. N has the geometric distribution in which 
Pr{N > k} = pro}, k=l Zas 


where B = Pr{W, < œ} = Pr{S, > 0 for some n = 1, 2,...}. (N = © corre- 
sponds to B = 1.) Given N = n > 0, the random vectors (W,, Z,), 1 <r < N are 
independent and identically distributed. 


Note. The statement of the lemma is highly intuitive in view of the fact that 
X,, X2,...are random variables such that the process starts afresh with the 
value S;,_, and this corresponds to zero with reference to the next waiting 
period W,. We encourage the student to supply a formal proof. 


The next theorem will be of considerable value once Theorem 3.1 is validated. 
Tentatively it is a vehicle toward our developments of fluctuation theory. 


Theorem 4.1. For u and t real and |t| < 1, 


Ew = P(u; t) (4.3) 
(for the definition of P(u; t) see (3.1)). 
Proof. Expanding the geometric series 
lama = S (ELA (4.4) 


1 = Bpen] Ss 
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is permissible since 
(Efe t” ]]| < Ele" |] <1 for |t| <1. 


Moreover, by Lemma 4.1 


k 
Efexp{id(Z, +---+ Z) tt] = E [ese] 
j=1 


Toe r 


E[e meka] 
1 
Aea (4.5) 


Further, decomposing for fixed k the values of W, + --- + W,into their mutually 
exclusive possible values allows the left-hand side to be written in the form 


Efexp{iu(Z, Frs Zp} t+ We] 


=} Í exp{iu(Z, +--- + Zt tit Wk dF 
Wit: +Wgk=n 


n=0 
o] 


= J t"E[exp{iuS,};W, +---+ W, =n] (4.6) 


=o 
because 


Zit + Z= S, when W,4+---+W=n. 
Substituting (4.6), with cognizance of (4.5), into (4.4) produces 


1 aS ; 
aera À LeS ee We 


=0 
= > È EI etn W, +- + W, = n], (4.7) 


where the interchange of summations is justified, since the double sum is 

absolutely convergent. Now observe that the event L,,, = n, asserting that the 

last maximum occurs at n, may be expanded as the union of the ee 

exclusive events that the index n is a ladder point for some k, Re = 1,2,3,. 
Therefore 


ioe} 


Efexp{iuS,}; Ly, = n] = > Efe"; W, +- + W, =n). 


k=0 


Then placing this representation into (4.7) confirms (4.3), The theorem is 
proved. W 
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As a sample application, let us quickly develop an expression for the B = 
Pr{W, < oo} that appears in Lemma 4.1. Observe that $ = Pr{W, < œ} = 
lim,,, E[t”']. Hence, using (4.3) with u = 0, 


1 ; 
=p = lim, +1 P(O; t) 


ioe) 1 : 
= exo X i Pr{S, > o) (using (3.3)) 
k=1 
or 
Syl 
p=1- exp(- D g Pris > o) 
k=1 
A companion identity to that of (4.3) is incorporated in the next theorem. 


Theorem 4.2. For u and t real and |t| < 1, we have 
1 — Efe@@11”] 
1 — te(u) 
(see (3.2) for the definition of Q(u; t)). 


= Q(u; t) (4.8) 


Proof. Consider the expression 


[1 — te(u)]Q(u; t) = [1 — te(u)] Ý, Efexp(ius,) Lon = Ot" 
£ 5" Eexp(iuS,); Lin = OJ" 
n=0 


= 2 o(u)E[exp(iuS,); Lun = OJt"* +. (4.9) 
n=0 


With cognizance of the hypothesis that the X,, are independent and identically 
distributed and that therefore X,,,, is independent of any event determined by 
the random variables {X,,..., X,}, we find that 


o(u)E[exp(iuS,); Lan 0] = E[exp(iuX „+ JELexp(iuS,); Lan z 0] 
= Eexp(iuS,,:); Ln = 0]. (4.10) 
Combining (4.9) and (4.10), a change of variable and the obvious cancellation 
of adjacent terms ultimately produces the equations 


[1 — tou] F Efexp(iuS,); Lm = ON" 


w 


> E[exp(iuS,); Lay = o" = D E[exp(iuS,); Lp- 1,n-1 = ojt” i 
n=0 


n=l 


H 


ob 
1 — $ Efexpu8y)s Ln iin 1 = 0, Len ON". (4.11) 


a=] 
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Now observe that the last maximum will be at position zero if and only if the 
waiting time to the first ladder point exceeds n. In symbols, La, = 0 if and only 
if W, > n. Thus L,_-,;,,-, = 0 and L,,, # 0 simultaneously hold if and only if 
W, =n. 

Using this fact in (4.11) we obtain 


[1 — rot) Y ElexpCiuS,); Lyn = OX" = 1 — Y EfexpliuS,); Wi = ne" 
n=0 =1 


= 1| — Efexp(iuZ,)t*] 
because S, = Z, when W, = n. This establishes (4.8). I 


5: Proof of the Main Fluctuation Theory Identities 


The task of this section is to furnish the proofs of Theorems 3.1 and 3.3 and of 
additional interesting identities, formulas, and general byproducts. We com- 
mence with the derivation of another double generating function identity of 
importance. 


Theorem 5.1. For |x| < 1,|t| < 1, x and t real, we have 
X E[expliuS,)x™"]t" = P(u; tx)Q(u; t). (5.1) 
n=0 


Proof. Observe that the partial sums of the finite set of variables X,, X4,...,X, 
have their last maximum at the position k (0 < k < n) if and only if the partial 
sums of X,, X2,..., Xp have their last maximum at k and the partial sums 
based on Xk+1, Xp42,---, Xn exhibit their last maximum at zero. In symbols, 


Lan(X 1; swa Xn) =k (5.2) 
if and only if 
LyX 1, .--, Xi) = k and La-k,n-k(Xk+1s ee Xn) = Oe (5.3) 


Let I=) denote the indicator function of the first event of (5.3) and 
Ltn-x,n-x=0} bear the corresponding connotations for the last event in (5.3). 
These two events of (5.2) and (5.3) are independent, since their characterizations 
involve independent sets of random variables. It follows that 


E{exp(iuS,); Lin = k] = E[exp(iuS;) 
x exp{iu(S,, a Si} Lix = k, | ie are = 0] (5.4) 
[where a = La-k,n-k(X e+ L 5 Xn] 


ae iuS iu(S,,—S, 
4 fe "Thin =k (e men A E (Xie + reeves Xn) =0)) dF (Xx) 
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The expression (5.4) equals 
E[exp(iuS,); Lix = kJELexp{iu(S, — S,)}3 En-x,n-x = O] 
because S, — Sy and {£,_,,-~ = 0} are determined by random variables 
independent of X,, X,,..., X,. Next multiply both sides of the resulting 
equation from (5.4) by x* and sum on k from 0 to n. We obtain 
E[exp(iuS,)x'’"] = $ x*E[exp(iuS,); Lix = k] 
k=0 


x E[exp{iu(S, = Si,)}5 Ln-kn-x = 0] 


Y x*arba-r (5.5) 
k=0 


with the evident definitions for a, and b,_,. Note that b„-ą depends un- 
ambiguously solely on n — k since X,4,,..., X, have the same distribution 
as X,,..., Xn—K- 

Recall the elementary power series equality 


D t" X akbn-k ER (Zae)( Y i) 
n=0 k=0 k=0 k=0 


Comparing this to (5.5), we infer that 


Ş Elexp(ivS,)x*]t" = Y ELexp(iuS,); Ly, = k]x*t 
n=0 k=0 


x E[exp(iuS,,-,); Ln-k,n-k = Oj 


1m s8 


n=k 
= P(u; tx)Q(u; t) (defined in (3.1) and (3.2)), 


since S,_, = X,4, +-+- + X, has the same distribution as S,_,. I 


Note. Theorems 4.1, 4.2, and 5.1 jointly express the fact that 


< . ee P(u; tx) 
2 ElexpliuS,) xt Je = fi — 19@]P (5.6) 


The import of Theorem 3.1 asserts 


Theorem 5.2 (Basic Identity). For |t| < 1 


def 


Plu; t) = exp} È É Elexp(iuS,); S, 2 oi} = f+(u;t) (5.7) 
=] : 


and 
oa) 


k : 
Qu; 1) = exp » A Efexp(iuS,); Sk < oi} Lrf (uO. (5.8) 
k $ 


=| 
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Proof. First we show that (5.7) and (5.8) are equivalent. Setting x = 1 in (5.1) 
gives 


P(u; t)Q(u; t) = Saree 


foe} 


ee Te |e- 2 [o(u)]"t" 


(owing to the independence of the r.v.’s {X;}) 


ee eee 

~ 1 = tolu) 
where the last equation was proved earlier in (3.6). It follows from (5.9) that 
either of (5.7) and (5.8) implies the other. 


We now prove (5.7). Differentiate equation (5.6) with respect to x and set 
x = | to yield 


= f(u; t)f-(u; t), (5.9) 


2 : tP’(u; t) 
t) = Y Efe™=L_ it" = ; 5.10 
HO = 2 Ele ll = E PE oe 
where P’(u; t) indicates [dP(u; t)]/dt. 
With u fixed we regard (5.10) as a first-order differential equation 
P'(u;t)_ 1- ro) def 
= t A(t 
o yO E AW 
(i.e., A(t) is the symbol for the middle expression). Integration gives 
P(u; t) Í : 
= | A(t) dt. 5.11 
os| pe a (t) dt (5.11) 


Note that 


AO = FLL D] ELL le 


= : [1 — te()] 3 E[e™S*L,|t" (since Loo = 0) 


n=1 


X Ee” Lyn ]t” t pos, 3 Efei™Sn+1 L |t" 
n=0 


È Ee” Lnn g L,- 1,n- pir” t 
n=1 


and termwise integration gives 


t" 


Í E a A A oie (5.12) 
0 


n=] 
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From the definition we see that P(u;0) = 1 and therefore (5.11) and (5.12) 
produces the representation 


Plu; t) = exp Y Ele" "(Ly — Lytna)I a (5.13) 


Now, we make use of the fundamental equivalence, Theorem 2.1, with g,(x) = 
exp(i 7-1 u;X;): 


E[exp(iuS,)Lmn] = >) Elexp(iuS,)Ln ; Lan = k] 
k=1 
(mutually exclusive possibilities) 


= > kE[exp(ivS,); Lin = k] 
k=1 


X kE[exp(iuS,); P, = k] 
k=1 
(by the equivalence principle (2.16)) 


= 5 E[exp(ius,)P,; P, = k] = Efexp(iuS,)P,]. (5.14) 


A similar identity holds with L,, and P, replaced by La-1,n-1 and Pp-1, 
respectively. Comparing (5.13) and (5.14), we conclude that 


foo) 


n= 


P(u; t) = exp} E({exp(iuS,)(P,, — P,-1)] n, (5.15) 


Observe, however, that the number of nonnegative partial sums of X,, 
X3,.--, Xn is the same as that of X,, X,,..., X,-, unless S, itself is non- 
negative. More precisely, 


Thus 
E[exp(iuS,)(P, = Pa-1)] = E[exp(iuS,)(P,, = Pa) Si < 0] 
+ E[exp(iuS,)(P,, T Peas Sp 2 0) 
= F[exp(iuS,); S, 2 0]. 


Substitution of this into (5.15) gives (5.7), which completes the proof of Theorem 
52 E oes 
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We are now prepared to give the proof of Theorem 3.2. 


Proof of Theorem 3.2. Consider 


E[exp(iuM,)] = X, ELexpGuM,); Lg = K]= Ý, EexpGus); Ln = KI 
=1 


k=1 


because M, = S, if Lan = k. By the relations (5.2) and (5.3) and continuing with 
the notation used in the proof of Theorem 5.1, we obtain 


E[exp(iuM,)] = >) Elexp(iuS,); Lix = k, Ln-x,n-x = 0] 
k=1 


= X: Efexp(iuS,); Ly. = k]JE[1; Led pak = 0]. 


Thus we have 


| 
m = 


Cn = E[exp(iuM,)] F E[exp(iuS,); Lik — k]E[1; Liaw = 0] 


k=0 
B ye mee (5.16) 
=0 


the composition formula for coefficients of power series (cf. the analysis follow- 
ing (5.5)). Constructing the power series with coefficients c, , it follows from (5.16) 


X Efexp(iuM,)]t” = > Elexp(iuS,); Ly = K] YELL; La-r,n-x = O10" 
n=0 k=0 n=k 
= P(u; t)Q(0; t). 


Applying Theorem 5.2, then 


3 E[exp(iuM,,) ]t” = exp} 5 
n=0 k 


=1 


E[Lexp(iuS,); S, 2 ai} 


as was to be proved. 
We prove one more consequence of Theorem 5.2, namely, 


Theorem 5.3. For u and t real with |t| < 1 and |x| < 1, 


œ şk 


oe) n œ k 
De Yi x Pr{L,, = k} = exp} = E Pr{S, = ojfexp| oy Pr{S, < oi}. 
n=0 k=O k f k 


= | of 
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Proof. Set u = 0 in (5.1) of Theorem 5.1. Then 


P(O; tx)Q(0; t) = P Epig, (5.17) 


But 
Eix] = § Ex”; Lm =k] = Dono =k] = X x" Pr = k}. 
k=0 k=0 


Substitution of this into (5.17) yields 


Ms 


t Şo Pr{L,, = k} = P(O; tx)Q(0; t). 
k=0 


ll 


n=0 


Now apply Theorem 5.2 with u = 0. Then the expectations are replaced by 
probabilities and Theorem 5.3 is proved. W 


The identifications of Theorem 5.2 in conjunction with Theorems 4.1 and 
4.2 produce the following important formula: 


Theorem 5.4. For u and t real and |t| < 1 


œ 4k 
1 — Efe") = exp(— 5 E Ble; S, = ai) (5.18) 
k=1 


6: More Applications of Fluctuation Theory 


A. LIMIT THEOREMS FOR THE NUMBER OF POSITIVE SUMS 
Under the sole condition 
Pr{S, > 0} = Pr{S, > 0} = 4 (6.1) 


(this requirement is fulfilled if {X;} are all symmetrical continuous random 
variables; cf. the discussion of (3.10)), we shall establish the famous arcsine law, 


ial (1 
im Fry— s = aAa 
n> n GM CCl) 
2 
= Ź arcsin /x, O<x<1. 
T 


This result is quite remarkable. The ratio P,,/n is the fraction of times (or periods) 
up to n for which S; = 0, and one might expect that this fraction would most 
likely occur near 4. Quite the contrary, as shown in Fig. 3. The density f(x) = 
(mJ (1 — x)) is bimodal, which is interpreted as follows: for any particular 
realization of Sy, Si, S3,... it is far more likely to find the limiting fraction 
P,/n close to 0 or I rather than $.. 
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0 1 x 
FIG. 3 The density f(x) = I/(a,/x(1 — x)). 


Recall the formulas (3.17) 


1 /2k\ (Xn — k) 
Pr{P, = k} = = ; . 
r{ n } w(x) noe) (6.2) 
To extract the limit distribution as n increases, we rely on Stirling’s asymptotic 
approximation : 
m! & () 2am 

e 

to obtain 


2m = (2m)! _ 1 52m 2 
m (m!)? / on m 
Thus, for large n, provided k and n— œ in a manner implying k/n > x, 


0 < x < 1, we secure the asymptotic relation 


11 1 11 1 


an E 
n n 


P [nx] 
Pr}? < shat X ea ae 
n Tk=o jk E) n 


Compute next 


n 


This sum is the approximation to the integral (the student should formalize this 
comment) 


1 f 1 2 
moos dé = — arcsin ./x. 
mio /{l—e) r ve 
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Thus 
f P, 2 ; 
lim Pr a <x = z arcsin Jx, O<x<l. (6.3) 
It is striking to contrast the limit distribution (6.3) with the conditional limit 
law, 
: Pi 
lim Pr<— < x|S, = 0> =x, O<x<tl. (6.4) 
n> oo n 


Consult Chapter 13, page 123, where the latter limit law is discussed. 


B. SOME PATH FLUCTUATION RESULTS 
Our aim is to prove the following list of affirmations: 


=. 
5 Prise >0}=o  ifandonlyif Pr{lim M, = oo}=1 
k=1 


n> oo 


if and only if Pr{S, > Oi.o.} = 1 


(i.o. stands for infinitely often), while 


y i Pris, > 0}< œ if and only if Prfaup M, < o} =1 
k=1 n> 
if and only if Pr{S, => 0i.0.} = 0. 


Recall that {S,, > 0 i.o.} is the event of S, > 0 occurring for infinitely many 
n. Its converse, {S,, > 0 f.o.} (f.o. abbreviates finitely often) is the event of S, > 0 
happening for at most a finite number of values n. 
We begin by defining 
Qn = Pr{Lan = 0} = Pr{S; <0 for j=1,...,n}. 
Then manifestly q, > q,+1 = 0, so that {q,} converges and 
lim q, = Pr{S; < 0 for all j > 1}. 


n> oo 


Now 
d-t) X at = Xat = X at**! = qo — X e-1 — q,)t". (6.5) 
k=0 k=0 k=0 k=1 


Since the coefficients of the series in (6.5) are nonnegative, we may apply the 
clementary Tauberian theorem, part (b) of Lemma 5.1 in Chapter 2, to conclude 


lim E S PACH = a| 


n= œo 


lim (1 — 1) qe 
k=0 


t>1 


lim qn 


n> 00 


"= Pr{S, <0 forallj > 1}. 
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At this point we display the basic identity (5.8): 


Q(0; t) = X Pr{Lyjn = 032" = $ gt = exp( X Pres, < 0} e) 
n=0 n=0 k=1 


Multiply this expression by 


to get 
a=) Sars exp(- » a) xo(, 3 T Pris, < o) 
n=0 k= 


= exp(- 3 D  Pr{S, > o) 
k=1 


The left-hand side has a limit as t > 1, and so must the right. Taking account of 
the nonnegative nature of the coefficients, it follows that (again consult Lemma 
5.1 of Chapter 2) 
t* 
1 
lim ye g PSr >0}= > g Prik > 0} < œ. 
=1 


t>1 k=1 


Thus 
Pr{S; < O for all j > 1} = lim(1 — t) $ qnt" 
t>1 n=0 
00 t* 
= lim exp( - X = Pr{S, = o) 
t>1 k=1 k 
So 
= exp(- X k Pr{S, = oi) 
k=1 
Hence 
ik L pets, > 0} = œ implies Pr{S, < 0 for allj > 1} = 0 (6.6) 
and 


D E PrtS > 0} < œ implies Pr{S; < 0 for all j > 1} > 0. (6.7) 
k=1 


Next we aim to prove the assertion 


21 
0 if e 
k=1 


Pr{S, > 0i.o.} = 


0 
1 ify 
ke 


| Pris, > 0} = 


wi 
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To this end, suppose }&-; (1/k) Pr{S, > 0} = œ, so that by (6.6) 
Pr{S; < 0 for all j > 1} = 0. (6.8) 
Then 
1 — Pr{S, > 01.0.} = Pr{S, > 0 f.0.} 
< Pr{S; < 0 for allj > 1} 


+ > Pr{S, > Oand S,4; < Ofor all! > 1}. 
k=1 


The kth term of the series evaluates the probability of the event signifying 
that S, is the last nonnegative partial sum among {S;};2 ,. Continuing, this is 


<0+ YPr{S, > Oand S,4, — Sk < O for all! > 1} 
k=1 


= ¥ Pr{S, > 0} Pr{S, < 0 for all 1 > 1} 
k=1 


(by virtue of the independence and identical distribution hypothesis) 


=0 (applying (6.8) to each term). 


Hence Pr{S, > Oi.o.} = 1. 

Conversely, suppose x-1 (1/k) Pr{S, > 0} < œ. Then Pr{S, < 0 for all 
n= 1, 2,...} > 0 and Pr{S, > 010.} < 1. But the Hewitt-Savage 0-1 law.t 
implies Pr{S, > 0i.0.} has probability 0 or 1 exclusively. In the present circum- 
stances it follows that it must be 0. 

Next we show that for any x > 0 


Pr{S, > 0i.0.} = Pr{S, > x i.o.}. 


Again appeal to the Hewitt-Savage 0-1 law, which affirms that both quantities 
have value either 0 or 1. Manifestly, if Pr{S, > 0 i.o.} = Othen Pr{S, > xi.o.} < 
Pr{S, = 0i.0.} = 0. Next suppose Pr{S, > 0i.0.} = 1. By virtue of the 0-1 law 
we need merely show Pr{S, > x i.o.} > 0. 


+ A function g(x;, x2,...) is called exchangeable if g(x,,...- Xn Antio) = Gays sos 
Xans Xvite +++) for every N and finite permutation (a(1),..., o(N)) of ,..., N). An event A 
is exchangeable if its indicator function /4(4,, X2, ...) is exchangeable. The Hewitt- Savage 0-1 law 
slates that the probability of every exchangeable event is 0 or | when X,, X2,... are independent 


and identically distributed random variables. For any w, the event (S, 2 x i.o.} is exchangeable. Its 
occurrence does not vary with any finite permutation of the summands, Therefore its probability 
is Vor | when the summands are independent and identically distributed, 
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Let N be the first n, if any, for which §, > x. Excluding the trivial case S, = 0 
for all n, Pr{N < œ} > 0. Let F(y) = Pr{N < œ and Sy < y}. Then 


Pr{S, > x i.o.} = Í Pr{Syi, > xio.|N < œ and Sy = y} dF(y) 


x 


x 


x 


Hence Pr{S, > x i.o.} = 1. 


> Í Pr{Syi, — Sy > 0i.0.|N < œ and Sy = y} dF(y) 


= fa dF(y) = Pr{N < œ} > 0. 


Summing up, we have established the following important fluctuation 


properties: 
= 1 PEPEN 
X 7 Pr{S, > 0} = 00 implies 
k=1 
implies 
implies 
implies 
while 
<1 NET 
X 7 Pr{S>0}<œ implies 
k=1 
implies 
implies 
implies 


Pr{S, => Oio.} = 1 


Pr{S, > x i.o.} = 1 for all x > 0 
Pr{M >x}=1 forall x>0 
where M = max(0, S,,S,,...) 


Pr{M = œ} = 1, (6.9) 
Pr{S, > 0io.} =0 
Pr{S, > Of.o.} = 1 
Pr{S, >xfo}=1 forall x >0 
Pr{M < of} = 1. (6.10) 


It should be underscored that all the implications in (6.9) are mutually exclusive 
with those of (6.10). To illustrate, suppose Pr{M < œ} > 0. Then certainly an 
x sufficiently large exists with the property 


Pr{S, > xio} < 1. 


Therefore, Pr{S, > x i.o.} = 0 and it follows from (6.9) that Pr{M < œ} = 1. 
Assuming E[X;] = u exists, it is possible to correlate the criteria of (6.9) 
and (6.10) with the sign of the mean contributed by each observation X;. The 


result is as follows: If u > 0 then 


E PHS 20) y 


k=l 


(6.11) 
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and (6.9) holds. If u < 0 then 

F -a (6.12) 
and (6.10) applies. 


Proof of (6.11). The strong law of large numbers asserts that 
Pr{S, > n(u — £) i.o.} = 1 
for any positive and when u — ¢ > 0; in particular, 
Pr{S, > Oi.0.} = 1, 
establishing (6.9). 


Proof of (6.12). Application of the law of large numbers informs us that 


s 
Priim Š < 5 <o} = 1, 
n>% M 2 


i.e., almost every path of the {S,} process exhibits the characteristic that S, < 0 
for all n beyond a certain time point depending on the realization. Equivalently, 
we have 


Pr{S, > Ofo.} = 1. (6.13) 


Comparing this result with (6.10) secures the conclusion of (6.12). 


A much deeper fact supplementing (6.11) and (6.12) pertains to the cir- 
cumstance when u = 0 and where X; are nondegenerate random variables. 
Then necessarily both 


© Pr{S, > 0} 


PoE ey anid aS 
ye k £ 


o <0} _ ka 


are maintained. In particular, this means that in the case of X; having mean 
zero, then 
Pr{M = œ} = Pr{m = - œ} = 1 (6.15) 


with M = max(0, S,, S2, ...) m = min(0, S,, S2, . . .), and the fluctuations of the 
{S,} process stretch continually over time from arbitrarily large positive to 
arbitrarily large negative values and also the other way around. 

For the case of symmetric random variables, i.e., where 


Pr{S, = 0} = Pr{S, < 0} = 4, 


then obviously (6.14) holds and extreme fluctuations are present such that 
(6.15) is correct, 
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C. CALCULATION OF MEANS OF LADDER RANDOM VARIABLES 


Let T be the time (number of periods) until the occurrence of the first positive 
partial sum in the collection {S,}. Formally 


T = min{n; S, > 0}. (6.16) 


Define T = œ whenever S; < 0 for all i. 
Set 


U = Sr = the first positive S,,. (6.17) 
Manifestly, U is a positive random variable and T is positive and integer valued. 


Caution. The ladder random variables defined in Sec. 4 involved the first 
nonnegative partial sum. Certainly if the underlying random variables X; are 
continuous, which we stipulate henceforth, then 


{T, U} and {W,, Za} 


share the same distribution. Intuitively where S, first becomes nonnegative and 
X; are continuous, then S, will undoubtedly be positive. 

Referring to Theorem 5.4 the joint generating function of {T, U} can be 
evaluated using the identity 


E[x"e7 +] = Y} x"E[e7 7; Lag = n] 
n=0 


foo} k 


ies exo( - 2 = E[e~*; S, > o), (6.18) 


=1 


valid for every |x| < 1 and å > 0. All the series and relevant integrals converge 
absolutely. We proceed to calculate E[T] and E[U]. First insert A = 0 in (6.18) 
and rearrange the right-hand side, yielding 


ico) k 
E[x™] = 1 — exo( — y - Pris, > o) 


k 


=1 - exp( — y = Prisy < 0}exo( - J =) 


1-d- vap(- 5 Pr{S, < o) (6.19) 


We restrict attention henceforth to the case where Ye Pr{S, < 0}/k < co, which 
clearly implies the condition )_, Pr{S, > 0}/k = œ. In this situation on 
account of (6.9) we know that Pr{S, > Oi.o.} = 1 and therefore T is well 
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defined and finite valued. Differentiate (6.19) with respect to x, setting afterwards 
x = 1. This easily furnishes the result 


E[T] = exo( Š =a =O} (6.20) 


Returning to (6.18), we now put x = 1 and execute the following evident 
elementary manipulations: 


E[e~ +0] 


Zid 
Fe apf- X k Efe": Sp = oi} 
k=1 


1—expy+ ¥ Tap 5K: Se < OJ} exp, X. 1 te] 
ka K kai k 


1— exp} + 5 1 Ble; S < oi} exp > tea} (6.21) 
k=1 k 


=1 


where (A) = E[e~***]. Here we tacitly postulated the existence of g(A) for 
A> 0. 


= | — [1 — 9] ap( 3 1 Efes; S, < a): 
k=1 


Next differentiate the final expression of (6.21) and subsequently substitute 
A = 0 (noting ọ'(0) = —u = E[X, ]) to obtain 


—E[U] = ọ'(0) exp( $ L Pris, < o) 


Thus we have achieved the formula 


E[U] = u ap( È 5 Pr{S, < o) = „E[T], (6.22) 


where the last equation results by virtue of (6.20). 

The outcome is a version of the Wald identity (cf. Chapter 6). 

It is evident that (6.22) is impossible when u = 0. Indeed recall after consult- 
ing (6.9) in this case that necessarily Y x=, (1/k) Pr{S, < 0} = œ and the right- 
hand side is not well defined. A much deeper analysis is required which we do not 
undertake. It can be shown that for u = 0 


E[U] = co? 


where ø? = Var X, and c is a suitable positive constant. It follows that where 
p = 0, then LEU] is finite if and only if o? < o. 
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D. RENEWAL THEORY APPLIED TO LADDER HEIGHTS 


Suppose E[X;] = u > 0. Continuing the constructions of (6.16), we define 


T, = min{n; n > T._1,S,> Sz...) 1r=1,2)... (6.23) 


The indices T, determine the successive times of new heights in the process {S,,}. 
Owing to the assumption E[X;] = u > 0, the random variables 


W. = T, — da r=1,2,..., 


are positive integer valued iid. random variables and generate a renewal 
sequence (see Chapter 5). The sequence 


U, = Sz, — Sr, P= ls etsy 


also induces a renewal process of positive i.i.d. random variables. Consider 
foo} 
X Pr{x < Sp, < x + h}. 
k=1 


This is obviously the expected number of ladder heights falling in the interval 
(x, x + h). Applying the basic renewal limit, Theorem 5.2 of Chapter 5, we 
obtain 


s. h h 
lim ) Pr{x < Sr, < x + h} = = : 6.24) 
Bot PRSS ae o 
Consider next i 
2 h 
lim Y Pr{x <M, < x + h, S, > M,-1} = (6.25) 


x> o n=1 E(U,) 


since the sum in (6.25) is clearly recognized as the same sum as that in (6.24). 
We conclude this topic by proving the following limit 


lim Y Pr{x <M, < x + h} = z, (6.26) 


x70 n=1 


Decompose the event {x < M, < x + h} in terms of the index time at which 
the maximum M, is first attained. The equality of events 


{x< M, < x+h} = |]{x<M,=M,< x+h, S, > M,-1} 
r=0 


(6.27) 
(V{S,-S,sQk=nr+ Livery n} 
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is a familiar one already used in the proof of Theorem 5.3. Since {X;} are i.i.d., 
we infer from (6.27). 


Pr{x <M, <x + h} = Y Pr{x < M, < x + h, S, > M,-1} Pr{Ly_,,9 = 0} 
r=0 


n 
= X a,bn-r> 
r=0 


exhibiting the usual convolution structure. It follows that 


Ms 
a 
x 


X Pr{x < M, < x + h} = 


n=1 


n 


ims 


= 

3 
ll 

lo} 


Mes 


Pr{x < M, < x +h, S, > M,_4} 
1 


x 
ll 


x È Pro = 0}. (6.28) 


Referring to the generating function (5.8), we find that 


¥ Pr{Lng =O} = exp( 3 ms <2) 


The limit of the first sum on the right of (6.28) is h/E[U, ] according to (6.25). 
But, citing (6.22), we have 


(6.29) 


. =<} (6.30) 


E(U,] = wE[T,] = nex ( Teas 

k=1 
Combining the results of (6.25), (6.29), and (6.30) in (6.28) yields (6.26). That 
(6.26) holds is rather remarkable since a corresponding renewal result asserts 
that (see page 192 of Chapter 5) 


lim X Pr{x < S, < x + h} = s, (6.31) 
n=1 


(This limit relation is a version of the extended renewal limit theorem and 
is considerably deeper.) 

Comparing (6.26) and (6.31) reveals that the expected number of visits to 
a fixed far-off interval equals the expected number of ladder heights falling in 
this same interval. 


E. AN INTEGRAL EQUATION FOR THE DISTRIBUTION OF M = max(0,S,,S;,...) 


We define 
G,(x).= Pr{M, S x} 
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with M, = max(0, S,, S2,...,5,). Observe that M, <x and X, = é are 
equivalent to the relations 


and 

Mni ae max(0, X2, X> + X3... X2 ++ Xn) < x— É; 
Conditioning on the values of X, with cognizance of the iid. character of 
{X;}, we obtain 


G(x) = | Cail- DAFO. 
By stipulating that F is derived from the density f, then 
c= f Gai — OSO dE = | Gn- 10DA — 0) dn 
m pO 0 


where the last equation results by an obvious change of variable. Since 


G(x) = lim G,(x) = Pr{M < x}, 


n> oo 


we find that G satisfies the integral equation 


G(x) = | GS — m dn: (6.32) 


It should be emphasized that we are interested in solutions of (6.32) apart from 
G(x) = 0. Actually, a nontrivial solution exists if and only if Pr{M < œ} =1, 
and the necessary and sufficient condition that G be nontrivial is 


1 


S Pr{S, 2 0} 


2 


00 (prove this). 

k=1 k 
The integral equation (6.32) features prominently in the theory of queueing and 
is commonly referred to as of Wiener—-Hopf type. 


Problems 


1. Let X,, X,,... be a sequence of independent, identically distributed integer-valued 
random variables. Define the partial sums Sy = 0 and S, = X, +X, +- + X, for 
k > 1. The subscript n > 1 is called a ladder index if S, > S; for j = 0, t,..., — 1. Call 
the event that n is a ladder index 6. Define Yọ = 0 and Yy as the time (i.e., the index) of the 
last occurrence of é, where the present trial is the Nth. Let W denote the time of first 
occurrence of 4, with W = œ should 4 not occur. Suppose 6 occurs at trial n, Prove that 
the number of trials until the next occurrence of é is independent of n and distributed as W. 
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2. Under the hypothesis of Problem 1 prove the identity 


1 — F(t) 1 
1—t 1— F(t) 


where F(t) = Xp, Pr{W = n}t". Assume Pr{W < œ} = 1. 


DD t"E[x”"] = 
n=0 


Hint: Use the relation 
Pr{Y, = k} = Pr{h = k} Pr{ Y,- = O}. 
3. Under the hypothesis of Problem 2 prove the exponential representation 
00 tk 
U(t) = apf ps 7 EDJ- EaD}, 
k=1 
where U(t) = 1/[1 — F(t)]. 
Hint: With the aid of Problem 2 derive the differential equation 
v@_ 2 e 
U(t) ~ ps ELY, = Y,- lt 


=1 


and solve it. 


4. Let X, X,, X,,... be independent identically distributed random variables with the 
distribution 


Pr{X = |} =p, Pr{X = -l} =q, 
where p + q = 1. Let Sọ = 0, and for n = 1, 2,... let S, = X, ++ + X,- 
(i) For p = q =4 compute Pr(maxo <;<, S; = k). Consider separately the two cases 


of n even and n odd. 
(ii) Suppose p < q, and set M = max;,9 Sj. Compute the distribution of M. 


5. Establish the identity 
© œ xk 

yx" Pr{S, = M, = 0} = exo 5 T Pr{S, = o) 

n=0 k=1 
Hint: Examine the appropriate generating function identity for 
Y x"Efe S; Lin =O] for A>O 
n=0 
and let 1 > oo. 


6. Prove the convergence of the series 


Hint: Define u, = Pr{S, = 0, M, = 0} and show that the sequence is monotone. Then 
use the result of the previous problem. 


7, Determine a generating function identity for the first nonpositive partial sum (consult 
first formula (5.18). a 
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8. Determine an identity for 


œ 
5 X'E[e Mre" MnS], 
n=0 


where |x| < 1 and t and u are real. 
Hint: Condition on the index time of the first maximum in the sequence {0, S,,52,..., S,}- 


9. Assume Pr{S, < 0} = 4 = Pr{S, > 0}. Determine the limit as n— co of Pr{(n — 
Lyy)/n < x}. 


Answer: (2/m) arcsin Jx. 


10. Define T as the time of the first ladder point. Show that the following conditions are 
equivalent: 
(G) Pr{T < œ}= 1, 
Gi) E[T] < ©, 
Gii) Pr{lim,.,..M, = œ} = 1, 
Gv) Pr{lim,.,, S, = ©} = 1. 


11. Prove that if 


1 n 
lim- F Pr{S,> 0} =% O<a<1, 


n>% P k=0 


and with P, as defined in (1.3), then 


. ELP,] 
lim ——— = a 


n> oo 


12. Establish the identity 


© pr,- ELT] 
E AMn] — i 
lee Te 
where Z is the value of the first positive partial sum and T is the index of this partial sum. 
13. Let X; be iid. r.v.’s with distribution law 


Pr{X, = 1} = Pr{X, = -1} = 4, 


and let 
S,= } X; So = 0. 
i=1 
Define 
U,„, = number of {k|S,_, > 0, Sk > 0, k = 1,2,..., n}, 
To, = max{k|S, = 0, k =1,2,..., n}. 
Prove 


2k\ (2n — 2k\ 1 
Pr{U,, = 2k} = Pr{T, = 2k} = C ma ) ste 
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14. Let {X;;i=1,2,...} be independent, identically distributed random variables 
such that 


Pr{X; = +1} = Pr{X,; = —1} =4. 
Let S, = 7, X; for 1<n< œ and P(m,n) = Pr{S,; = 0} for some j satisfying 
m<j<m+n}. 
Prove that P(m, n) + P(n, m) = 1 form > 1,n > 1. 
Hint: Note that Pr{S,, = 0} = Pr{S, # 0, S2 Æ 0, ..., S2, 4 0}. Now assume the result 
holds for m = k and arbitrary n > 1. Then justify the following inequalities: 
1 — Pr{k + 1,n} = Pr{S,,;# Ofork<j<k+n-+ 1} 
+ Pr{S,, =Oand$,,;40fork+1<j<k+n+1} 
= Pr{S,,;# Ofork<j<k+n-+ 1} 
+ Pr{S., = 0} Pr{S,; 4 Ofori <j<n +1} 
= Pr{S,, = 0 for some jwithn+1<j<k+n+1} 
+ Pr{S,,; # Oforl <j <k + 1} Pr{S, = 0} 
= Pr{S,,; = 0 for some j withn +1 <j<k+n+1} 
+ Pr{S,, =Oand S,,# Oforn+1<j<k+n+1} 


= Pr(n, k + 1). 
15. Find conditions so that 
Pr{P, > o} = 1 
holds in terms of the convergence or divergence of the series 
z > 
= Pris = OF 


16. Let X,,..., X,, be independent, symmetric random variables such that 
| 
Pr Sax, = o} =0 
i=1 
for all nontrivial choices of the e;, & = 0, +1, or —1. For each nonvoid subset T of 
{1,..., n} let Sp = Jier Xi and let N be the number of nonvoid subsets T of {1,..., n} 
with the property that S; > 0. Show that for each m, O < m < 2"7 +, Pr{N = m} = 1/2". 


Hint: Prove and apply the following result. Let x,,...,x, be real numbers such that 
ern cix; # 0 for each nontrivial choice of the ¢,,¢; = 0, +1, — 1. Then for each m,0 < m < 
2" — 1, there is a unique sequence {n;} of signs, n; = +1, such that exactly m of the 2” — 1 
sums ) ier NiXi, as T ranges through the nonvoid subsets of {1,..., n}, are positive. 

To prove this result, notice that if 


ANis My) = {xix =} nx; forsome TS (mh, 


leT 
where ) eg 1) % is interpreted to be zero, then 


A(t ya cere Mja Wp Hyder Mn) = AM EXE A(t. MD} 
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17. Let X, X,, X2,... be independent and identically distributed random variables with 
means E[X;] = —u <0 and variance Var(X,) = 0? < œ. With Sọ =0 and S, = 
Xi +e + X, n> 1, let M = max,,0 S,. Prove that M and (X + M)* share the same 
distribution, where x* = max{x, O}. 


18 (continuation). Using the same notation as in Problem 17, establish the inequality 
E[M] < 407/u. 
Assume E[M?] < oo. 
Hint: M has the same distribution as (X + M)*, whence 
E[M] = E[(M+X)*] and E[M?] = E[{(M + X)*}’]. 
With the notation x7 = (—x)", 
M+X=(M+X)'—-(M+X), (M+ XP ={((M4+ XP + (M4 X)}, 
and so 
E((M + XY] = —E[X] and = E[{(M + X)7}?] = 2E[M]E[X] + ELX?]. 
But 
EHM + X) ¥] > {EIM + X)"]}’, 
and so 
2E[M]E[X] + ELX?] — (E[X]) = 0, 


from which the result follows. 


NOTES 


The fundamental Spitzer formula and related functional identities provide powerful tools 
for determining distributions of a wide spectrum of natural functionals on the process of 
sums of iid. random variables. Spitzer [1] and Feller [2] set forth this important topic, with 
its numerous ramifications and applications to queuing and storage phenomena, proba- 


bilistic potential theory, and more elaborate compound renewal structures. 
Takacs [3] emphasizes the intrinsic combinatorial nature of many of the identities. 
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Chapter 18 


QUEUEING PROCESSES 


1: General Description 


In Chapter 3 we discussed two simple examples of discrete time queueing 
processes. In this chapter we develop more completely the underlying concepts 
and methods of several continuous time versions of queueing processes. The 
general queueing model embodies the following physical structure. Custom- 
erst arrive at random times to some facility and request service of some kind. 
Queueing processes are classified according to 


(1) Input distribution (input process), the distribution of the pattern of 
arrivals of customers in time, or more specifically, the distribution of time 
between arrivals; 

(2) Service distribution, the distribution of time to serve a customer; 
and 

(3) Queue discipline, the number of servers and the organization of line 
and service. In most models the discipline is “first come, first served,” with 
service for a customer beginning as soon as he reaches the head of the line. All 
the models we shall consider in this chapter are of this type. (For other types of 
queue disciplines we refer the reader to Problems 2-4 of this chapter.) 

We always assume that the durations between successive arrivals of custom- 
ers, called the interarrival times, are independent, identically distributed, 


+ Customer is a generic term whieh may refer, for example, to bona fide customers demanding 
service ata counter, ships entering a port, low of messages into an office, broken machines awaiting 
repair, ete. ` 


400 
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positive random variables. Such input processes, referred to as renewal processes, 
are extensively discussed in Chapter 5. 

The term “random arrival” is sometimes used to indicate that the input 
distribution is Poisson, more precisely, the events corresponding to the arrival 
of customers form a Poisson process. This implies that the interarrival distribu- 
tion is exponential. 

It is also assumed that the lengths of service for individual customers are 
independent, identically distributed, random variables, and independent of the 
input process. 


2: The Simplest Queueing Processes (M[M/1)t 


The simplest and most extensively studied queueing processes are those where 
the input process is Poisson and the service distribution is exponential. We have 
already described these processes and shown that the queue size forms a birth 
and death process (cf. Example 2, Section 6, Chapter 4). 

Let us examine the one-server case again. The density function of the inter- 
arrival time is 


dF(t) = de~*" dt, A>0, 
and the service density is 
dG(t) = pe “ dt, p> 0. 


Because of the “forgetfulness” property (Theorem 2.2, Chapter 4) of the ex- 
ponential distribution we conclude easily that the process X(t) = length of 
queue at time t is a time homogeneous Markov process and is in fact a birth 
and death process. Let P,{t) be the transition probability function. Then 
P; ;+1(A) is the probability of a single new arrival occurring in the next h units 
of time with no services completed. Thus for small h > 0 i 


Pi i+1(h) = Ah + o(h), i> 0. 
Similarly, we find 
Pi i-:(h) = uh + o(h), i21, 


and 
P; (h) = 1 — (4 + wh + o(h) (i > 1), 
Po, o(h) = 1 — Ah + oh). 


+A standardized shorthand is used in much of the literature for identifying simple queueing 
processes. We shall include for reference the shorthand names of the processes we study. In the 
symbol A/B/c, c is the number of servers, while A and B indicate the arrival and service distribu- 
tion, respectively. The symbols used in the first two places are G = GI, general independent; M, 
exponential interarrival or service times; E, (Erlangian, interarrival or service times distributed 
as a gamma distribution of order k (so that E, = M); and D (deterministic), a schedule of arrivals or 
fixed service lengths, 
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The infinitesimal generator is 


-à à 0 0 
A u -Å+ oA 


We indicated in Section 6 of Chapter 4 that 


: o SAWA- iu, A <p, 
lim P= 0) = Yo ee 


This gives us the answer to many problems involving stationarity. If the 
process has been going on a long time and å < u, the probability of being served 
immediately upon arrival is 

À 
po= {1 -—-]. 
: ( ) 


We can also calculate the distribution of waiting time in the stationary 
case when A < wu. If an arriving customer finds n people in front of him, his 
total waiting time, T, including his own service time, is the sum of the service 
times of himself and those ahead, all distributed exponentially with parameter u, 
and since the service times are independent of the queue size, T has a gamma 
distribution of order n + 1 with scale parameter u 


t n+ ian. — pt 


unt tre 
Pr{T < head} = | —————— dt. 2.1 
1{T < t|n ahead} f arD” (2.1) 


By the law of total probabilities, we have 


PriT <t} = PHT <t|n ahead (2) (1 a z), 


since (A/u)"(1 — å/p) is the probability that in the stationary case a customer 
on arrival will find n ahead in line, Now, substituting from (2.1), we obtain 


ee) t  ntion,—pt n 

Ut tte A 3) 

Pr T<t = eS he dt 
{ ) pA o Tn +1) (;) ( H 


t ae B A o] cnn 
j [u (| a (5, Tin + D) 7 
= (h — îe exp| -e1 — 2) dt 
0 H H 
A 
oe eer zul tes 
i cx a a) 


which is also an exponential distribution. 
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If we wish to answer nonstationary questions, it is essential to determine 
P;{t) for all t. This is a much harder problem but it has been solved. The details 
of this solution are beyond the scope of this book and we refer the interested 
student to any of the advanced books on queueing theory listed at the close of 
the chapter. 

With the same exponential input and service distribution, when two servers 
are waiting on the line containing more than two customers, the expected time 
1/u, until some completion of service is half of that when only one server is 
engaged. Thus u, = 2u,n > 2. Butifn = 1, one server is inactive, so y, = u, and 
the infinitesimal generator of this birth and death process has the form 


—A A 0 0 
u —(A +n) A 0 
A=] 0 Qu —(A + 2p) A 


0 0 2u (A + 2p) 


3: Some General One-Server Queueing Models 


We shall discuss aspects of three methods for analyzing particular cases of the 
general queueing process GI/GI/1. The first method, known as the integral 
equation method, reduces the problem of finding the limiting distribution of the 
waiting time of the nth customer (n > oo) to the problem of solving an integral 
equation of so-called Wiener-Hopf type. f 

If the input process is Poisson, a second method of attack is to examine 
the length of the line at just those moments when a person completes service. 
This embedded process can be seen to be a Markov chain (see Section 4 of 
this chapter). If the service distribution is exponential and the input distribu- 
tion is general then the embedded Markov chain is obtained by inspecting the 
queue size at the instant of each new arrival. The resulting process forms a 
Markov chain of special structure. 

The third method investigates the properties of the random variable W(t), 
which is the time a customer would have to wait if he arrived at time t, re- 
gardless of whether or not there was an actual arrival at time t. This quantity 
is called the virtual waiting time of a fictitious customer arriving at time t. 

We shall begin by considering the integral equation method and then go on 
to the more restricted models to which the method of the embedded Markov 
chain is applicable. Some aspects of the third method will be discussed in 
Section 8. 
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A. INTEGRAL EQUATION METHOD} 
Let us define the quantities 


W, = the waiting time, excluding service, of the rth customer, 

S, = the service time of the rth customer, and 

T, = time between arrival of rth and (r + 1)th customers (ie., the rth 
interarrival time), 


with W,, So, To all taken to be zero. We shall assume the first person arrives at 
the instant t = 0 and finds no one ahead of him. 

Clearly W, + S, is the length of time that the rth customer spends in the 
system, i.e., waiting time plus service time. Therefore, if T, > W, + S,, then 
the (r + 1)th customer will find on arrival no line ahead of him, and so his 
waiting time W,, , is zero. In the case that T, < W, + S,, then the length of his 
waiting time is clearly W, + S, — T,. Therefore, 


PRE W, +S,- T if W, + S,— T, > 0, 
ete AO if W.+S,-T, <0. 
Let us write 
U, = S, — T,. 
Then clearly {U,}?_ , is a sequence of identically and independently distributed 
r.v.’s. Let F,(x) be the distribution function of W, and g(x) = G’(x) the density 


function of U,, which by assumption is the same for all r. Then for x > 0, 
since W, and U, are independent r.v.’s, 


Fy 4100) = Pr{W +; < x} = Primax(W, + U,, 0) < x} = Pr{W, + U, < x} 


= | Pr{W, + U, < x|U, = y}g(y) dy 


= | F(x — y)g(y)dy = (r > 1). (3.1) 


Now since the first person arrives at t = 0 and does not wait, 


1 for x >0, 
fay 10 for x <0, 


and since for x < 0 all F(x) = 0, then 


F(x) — F(x) > 0, -0 < x< oO. 
i 
+The remaining material of this section can be skipped on first reading without loss of con- 
tinuity. 
{The restriction that GQ) has a density function is imposed to case the exposition and is not 
essential to the following arguments. : : 
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But z 
F(x) SA F, 4. 1x) a | [F,—1(x a y) = Fx = y)lgO) dy, 
y<x 


and so by induction it follows that, for all r, 
Fx) — F,41(x) = 0, -0 <x< 0. 


This shows that the F,(x) are decreasing with respect to r for each x; and since 
F(x) = 0, it follows that F,(x) converges to, say, F(x). Passing to the limit in 
(3.1) yields 


F(x) = | F(x — y)g(y) dyt 


or, setting z = x — y, 
F(x) = | F(z)g(x — z) dz. 
0 


We must now investigate the question of when the limit F(x) is an honest 
distribution. It is clear that F(x) is nondecreasing, but it could happen that 
lim >œ F(x) < 1 rather than lim,.,,, F(x) = 1. The former possibility can be 
interpreted to mean that the waiting time of the nth customer (n > œ) ap- 
proaches œ with positive probability or, equivalently, that the length of the 
line becomes œ (i.e., arbitrarily large) with positive probability. 

We shall first derive a new expression for F(x). 


Since 
mosie zo o 
we have 
F(x) = | g(u) du = Pr{U, < x}, x20. 
Now i 
F3(x) = [ne — u)g(u) du = [w ava du 


| glu2)g(uy) duz du, 
u2 <x, u1 +Uu2<xX 
= Pr{U, < x, U, + U, < x} = Pr{U, <x, U, + U, < x}, 


+ The justification of interchange of limit and integral requires knowledge of the Lebesgue 
integral, and should be accepted on faith if not already familiar to the reader. 
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where we have used the fact that U, and U, are independent, identically 
distributed r.v.’s. A straightforward induction now yields 


F,+1(x) = Pr{U, < x, U, + U,-; < x,...,U, +--+ + U; < x} 
= Pr{U, < x, U, + U2 <x,...,U, +e ++ U, < x} 


(since U,, U2,..., U, are identically distributed). Thus if 0, = Vier Un, 
F,4,(x) = Pr{0, < x,r =1,...,n}, x>0. 


Clearly F,,(x) decreases monotonically with n (we also proved this analytically 
above), and so 


F(x) = Pr{U, < x for all r}, x>0. 


For x < 0, all F(x) = 0, and so trivially F(x) = 0 for this range. 

Using the above result, we may determine when F(x) is an honest distribu- 
tion: Assuming E[S] < œ and E[T] < œ, i.e., the r.v.’s S and T have finite 
expectations, we have the following. 


Theorem 3.1. (i) If E(U) > 0, then F(x)=0. (ii) If E(U) < 0, then F(x) 
is a distribution, i.e., lim,» F(x) = 1. 


Intuitively, this result is anticipated. It asserts that if the interarrival time 
on the average is shorter than the average length of service, the line is sure to 
grow without bound and W, — œ with probability 1. The proof falls into three 
parts. 


Proof. (1) ELU]>0. 
By the strong law of large numbers, 
0 
lim — = E[U] with probability 1; 
therefore for almost all realizations of the sequence U,, 0,, 03,...,i.e., with 
probability 1, 


0, > 4nE(U] (3.2) 


n Z 


for sufficiently large n, where the choice of n depends on the realization. The 
event U, < x, for all r, is part of the event complementary to (3.2), and so its 
probability F(x) is zero. 

(2) E[U] <0. 

Again, from the strong law of large numbers we know that, for any pre- 
assigned £ > 0 and arbitrary ô > 0, there exists an integer N, ¿ such that for 
n= N, awe have 


Pe = buy <iforalln> Neal 2l—6. 
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Now let ¢ be small enough so that E[U] + € < 0. Then for any 6 > 0 there is 
an Ns such that 


1 — ô < Pr{U, < n(E[U] + £) for all n > N;} 
< Pr{Ū, < 0 forall n > N} 


by virtue of the property that E[U] + £ < 0. Now since G(x) is a bone fide 
distribution, for this same ô and N; we can choose x sufficiently large so that 


Pr{B} = Pr{Ū, < xforalli<r<N,;—-1}>1-6 


where the event B is defined in the obvious way. 

Let A denote the event defined by the condition {0,, < Ofor alln > N;}. The 
event {0, < x, for all r}, includes the intersection of the two events A and B 
and so has probability >Pr{A ^ B} = Pr{A} + Pr{B} — Pr{A u B} > 2 — 
26 — 1 = 1 — 26. Therefore, for x sufficiently large, F(x) > 1 — 26, and 


lim F(x) > 1 — 26. 
But since ô is arbitrary this means that lim,.,,, F(x) = 1. 

(3) ELU]=0. 

In this case the result follows from a rather deep theorem dealing with 
recurrence phenomena for sums of independent random variables whose presen- 
tation is beyond the level of this book.f A discrete version of the theorem is 
Theorem 1.1 of Chapter 12. 


B. RECURRENCE OF THE EVENT THAT AN ARRIVING CUSTOMER HAS NO WAITING TIME 


Looking at the discrete time process defined by the variables W,,n = 0, 1,2,..., 
with state space [0, co], we might ask about the recurrence of the event that 
some customer finds an empty line. Formally, we say that A occurs at the nth 
step if W, = 0. The statement that the event A occurs without further quali- 
fication shall mean that it occurs at some finite step. Notice that whenever 
event A occurs the process begins with the next value of W equal to 0. 


Theorem 3.2. (1) If EÇU] > 0, the event A is transient (i.e., the probability of 
A is less than 1). 

(2) If E[U] < 0, the event A is recurrent (i.e., Pr{A} = 1). 

(3) IfELU] < 0, the event A is positive recurrent (i.e., the expected time until 
occurrence of A is finite). ` : 


Proof. (1) Using the same notation as before, we note that 


Waisi 2 U, 


f F. Spitzer, “ Principles of Random Walk.” Van Nostrand Reinhold, New York, 1964. 
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(recall that 0, = Ui +--+ U, = S, +-+::+ 8, — T, — T; — ++: — T, = the 
difference between the total service times of the first n customers and the total 
interarrival times of the first n + 1 customers). Inequality holds provided the 
server has been busy throughout. Alternatively, if all U,, > 0, A does not occur. 
By the strong law of large numbers, for any and ô > 0 there is an N such that 


Ŭ 
Prf =- Etu] < eforalln > n} > 1— ô. 
n 
Thus, if E[U] > 0 and ¢ is chosen sufficiently small, in fact if e < E[U], there is 
an N so that 


Pr{Ū,„ > 0 for alln > N} > 0. 


This says that A may occur only a finite number of times with positive proba- 

bility, i.e., the event {W, = 0 only finitely often} has positive probability. 

But an event is recurrent if and only if it occurs infinitely often with proba- 

bility 1 (cf. Theorem 7.1, Chapter 2). Therefore, if E[U] > 0, A is transient. 
(2) IfE[U] < 0, for appropriate and 6 > 0, there is an N such that 


~ 


S- ALU]| <eforatin = N} = 1-6 


Pr} Y 
n 


Pr{Ū,„ < Oforalln > N} > 1-— ô, 


so 


and since 6 is arbitrary, a fortiori 
Pr{0,, < 0 for some n} = 1. 


But if 0, < 0, then some W, = 0, in particular, if 0, is the first 0,, which is 
< 0, then W,,, = 0, and so if E[U] < 0, A is recurrent. 

If E[U] = 0, the result is rather deep, and we do not enter into the details 
of the proof; see, e.g., the references listed at the close of the chapter. 

(3) When E[U] < 0, we claim that A is positive recurrent. The proof will 
not be given. W 


4: Embedded Markov Chain Method Applied to the 
Queueing Model (M/G1/1) 


We examine the special case of a one-server queue with Poisson input process 
with parameter A. The service distribution is assumed to be any distribution 
B(v) of a positive random variable V for which E[V] < ~©. For ease of exposi- 
tion we will assume that B(v) has a density function b(v). We shall investigate 
this process by the method of the embedded Markov chain. 
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The associated embedded Markov chain is determined as follows. 

Let Z(t) denote the number of customers in the queue at time t (t > 0). 
Suppose we look at the Z(t) process just when a customer finishes service. In 
this way we generate a sequence of integer values 


Z(t) Z(t2), (tz), «++, (4.1) 


where t,, t2, t3,... are the successive times of completion of service. The 
sequence {Z(t,)} forms a discrete time process 


Xo=0, X,=Z(t,), n=1,2,... (4.2) 


We shall show below that because of the Poisson nature of the input process 
the sequence (4.2) describes a Markov chain. 

A more intuitive description of the Markov chain goes as follows: 

The transitions of the Markov chain occur only at those times when a 
person completes service. Let the state of the chain, until the next person 
completes service, be the number of people the departing customer leaves 
behind him, including any person who begins service as he leaves. 

It is easy to see that this process is Markov, for if X, is the state of the system 
at time n, then 

X, -1+N if X,> 1, 

Xora = fn if X,=0, n=0,1,2,..., (4.3) 

where N is the number of people who arrived during the service time V of the 

(n + 1)th customer. But the random variable V, by assumption, is independent 

of previous service times and the length of line. By the stationary character of 

the Poisson process, the number of arrivals N during a service period depends 

only on V and not on the length of the line or on the moment at which service 
began. These facts imply that {X,,} is a Markov chain. 

The probability distribution of N can be computed by conditioning on the 
values of V and invoking the law of total probabilities. Thus 


Pr{N =n} = [pen = n| V = v}b(v) dv. 
0 


Now, the number of customers arriving in an interval v is a random variable 
following the Poisson law with parameter Av, and therefore we have 


-av 0)" 


Pr{N =n|V =v} =e =, 
ni 


so that if j > i — 1 > 0, then 
Pi = Pr{X,41 =J|X, = i} = Pr{N =j- i+ 1} 
2 ae o (Avy itd 
= | Pr{N=j-—i+1|V =b) dv = | e7 ———— bv) dv 
[ope j IV = v}b(v) i Ga 
and Pj = Oforj <i- 1. 
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If a customer departs leaving no one behind him, the state remains zero 
until another has arrived and been served. Thus, as far as transitions to another 
state are concerned, the zeroth and first states are identical. 

Thus if we set 


co ,—Av r 
t= | oO a) av, r=0,1,2,... 
0 r: 


which is the probability of r people arriving during a service period, then 


ko ky kz 
ko kı k 
P=] 0 ko ky >f (4.4) 


0 0 ko 


The study of the Markov chain (4.4) per se was carried out in some detail in 
Section 5 of Chapter 3. We proved there that P is positive recurrent, recurrent 
null, or transient according to whether 

0 œ foe} À r 
p= } rk = | e» 5 rY (0) dv 
=0 0 r 


r=0 


= | vb(v) dv = AELV] < 1, =1, or >1. 
0 


The quantity 


_ expected length of service time per customer _ EEY] _ JELV] 
7 expected length of interarrival time 1A 


is called the traffic intensity. In the case where p < 1 we shall now determine the 
limiting stationary distribution of the Markov chain. 


A. THE STATIONARY DISTRIBUTION OF THE EMBEDDED MARKOV CHAIN 


We wish to determine 
T = (No, Tiye.) T; > 0, pS Ti = 1, 
such that 


Vary ay.  j=0,1,2,..., 
i=0 
where ||P;,|| = P is defined in (4.4). In terms of the k;, these equations are 


f+ 
Ty = Tok; + YK et ts i= (0, PPA 
r=ł 
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We solve for the generating function 
2 a: 
ms) = > ms 
i=0 
in terms of 
cod A 
K(s) = > kis. 
i=0 


Multiplying through by sê, we get 


, itl ; Nokia 5)? 4 
ST; = To kis E Sheps anit aaa i=0, 1, Deion 
r=0 


Summing over i, and recognizing Lit}, 2,k;,,~, as a convolution, we have 
we 1 To 
> ms = a(s) = to K(s) + ` [K(s)x(s) — zoko] — a [K(s) — ko]. 
i=0 


Solving for 2(s) gives 
_ To K(s\(s — 1) 


ms) s — K(s) 


This formula determines the generating function of the stationary distribution 
to within a constant factor. Since it is clear from the definition of the k; that 


0 


i=0 
we notice that the expression for z(s) contains in the denominator the factor 
s— K(s)_s—1 1-—K(s) 
sak gel 1—-s ’ aa 


which approaches 1 — K’(1) as s > 1. 
Let us evaluate K’(1), the expected number of people to arrive during a 
service period. 
= EEV] 
K'(1)= )o rk, = AEV] = — = 
> E [A] 
[p = (expected service time)/(expected interarrival time) is, of course, the 
traffic intensity], where 1/A = E(A) is the expected duration of interarrival 
times. Since p < 1, a stationary distribution exists, and therefore x(1) = 1. 
But our formula gives 


and so 
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Thus the generating function of the stationary distribution is given by 


-1 
=a — p) = 


The quantity 1 — p represents the stationary probability that the line is empty. 


B. EXPECTED LENGTH OF LINE IN EQUILIBRIUM FOR THE (M/G1/1) QUEUE 


We close this section by calculating explicitly the expected length of line and 
the expected waiting time for a customer just arriving under the condition 
that the queueing process has attained a stationary state. 

Differentiation of the generating function z(s) does not easily lead to an 
expression for E[q] where q is the stationary r.v. of the embedded Markov 
chain. We can, however, find E[q] by another method. 

If q is the number of people in line after a customer departs, and q’ is the 
number after the next departure, then 


q=q—-1+64N, 


where N is the number of arrivals during this service period, and 


5a3! if q=0, 
10 if q>0. 


In the stationary state, q' has the same distribution as q. Thus E[q’] = E[q], 
and 


E[é6] = 1 — E[N] = 1 — p. (4.5) 
From the same expression, since ô? = ô, 
q? =q? + ô+ (N — 1)? + 246 + 25(N — 1) + 24(N — 1), 
and since qô = 0 (by the definition of ô), 
q? =q +6+ NIN —1)4+ (1 —N) 4+ 26(N — 1) + 2q(N — 1). 


But N, the number of customers that arrive during a service period, is inde- 
pendent of q, and hence also of ô. Thus, taking expectations and referring to 
(4.5), we obtain 

E{q’*] = E[q*] + 1 — p + E[N(N — 1] + 1-9 + 201 — pip — 1) 

+ 2E[q](p — 1). 

Since E[q’?] = E[q”] by the stationarity postulate, these terms drop out. Then 
solving for E[q] gives í 
ETN(N — 1)] 


Elq| =p - 
lal =p Xp ~ 1) 
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Now 
K(s)= J k, = | e 4A —9b(y) dv; 
n=0 0 
so 
ELN(N — 1)] = X n(n — 1)k, = K"(1) 
n=0 
and 
K”(s) = e| v2e7 %0- 9b(v) dv. 
0 
Therefore 
ELN(N — 1)] = 7? faw dv = X {var(V) + [E[V]]?} 
= var(AV) + p°, 
since 
p = AEV]. 
Thus 


var(AV) + p° 


4.6 
2(1 — p) oe 


Efq] =p + 


C. EXPECTED WAITING TIME 


Still assuming equilibrium, we may find the expected waiting time for a single 
customer. Suppose a customer waits a length of time W until his service begins 
which lasts V units of time. Assume that when he departs he leaves q customers 
behind him. Then in time W + V, q customers arrived, in accordance with the 
laws of a Poisson process. For stationarity to hold, it follows that 


expected waiting time + expected service time per individual 
= E[W] + EEV] 


should equal the 
expected number of customers to arrive in this period multiplied by the 
average length of the interarrival time 


1 


= 7 E{q]. 


4. QUEUEING MODEL (M/G1/1) 503 


But we are in an equilibrium situation, and so substituting from (4.6) gives 
var(AV) + p? 
on | 
Dividing by u = E[V] and remembering that Au = p, the last relation becomes 
E[W]  var(AV) + p° 
E[V]—-2p(1 =p) ’ 


E[W] + E[V] = ; E + 


or after some simplification 


var(V/u) + 1 
2(1 — p) 


The formulas (4.6) and (4.7) express somewhat surprising facts. They say 
that for given average arrival and service times we can decrease the expected 
line length and expected waiting time by decreasing the variance of service 
time. The best possible case in this respect is clearly the constant service time. 


E[W] = p EÇV]. (4.7) 


D. DISTRIBUTION OF WAITING TIME 


By the same approach as above we can find the Laplace transform of the waiting 
time distribution. Let {r;} be the equilibrium probabilities whose generating 
function 2(s) was described above. If a person waits a length of time W until 
service and is served in time V, the probability that he leaves q people behind 
him when he departs, which must be z,, is the probability that q people arrived 
during the time W + V. Since arrivals are Poisson with parameter A, 


£ At)? 
Ta = | ew ao; 
o q: 
where C(t) is the distribution function of W + V. Then 


na(s) = ) 1,81 = | e70- dc(t) 
q=0 0 


= ČA — s)), 
where C(s) is the Laplace transform of C(t). But C(t) is the distribution function 
of the sum of the independent random variables W and V with distribution func- 
tions W(t) and B(t), respectively. The Laplace transform of the sum of Wand Vis 
the product of their Laplace transforms. Therefore 
n(s) = WAA — BAA — s)), 
or 
A — z)/A) 
Wie) = ™ 
(z) Biz) 


504 18. QUEUEING PROCESSES 


5: Exponential Service Times (G/M/1) 


Another model which can also be studied by the method of the embedded 
Markov chain is one with a general input distribution H(u) for interarrival 
times, but exponential service; i.e., service times are distributed exponentially 
with parameter pu. 

In this case let the transitions of the embedded Markov chain be induced 
at the times of arrival of new customers, and let the state until the next transition 
be the number of people the new arrival found in front of him. 

If q is the state of the system after one arrival and q’ the state after the next 


arrival, then 
q' = max{q + 1 — N, 0} (5.1) 


where N is the number served in the intervening period. Because of the ex- 
ponential distribution’s property of “forgetfulness” (see Theorem 2.2, Chapter 4, 
the number N served during an epoch between arrivals depends only on the 
length of that interval and q and not on the extent of service which the present 
customer has already received. The interarrival times are, of course, independent 
random variables. By virtue of the foregoing facts we conclude that the transition 
law (5.1) generates a Markov chain. We next calculate its transition probability 
matrix ||P;;ll. 

Since N > 0, if j > i+ 1, then obviously P;; = 0. If i+ 1>j > 1, then 
i+ 1 — j individuals were served during an interarrival epoch. We denote 
the probability of this event by a;,,_;. Evidently ifi + 1 > j > 1, Pi; = aj4,-;. 

It is worthwhile to determine a, explicitly in terms of the interarrival distri- 
bution and the exponential service time distribution. To this end, we note that 
for an interarrival time of length € the probability that exactly k services occur is 


FOE) — FENG), (5.2) 


where F(é) = 1 — e~“* is the service time distribution and F(é) denotes 
the k-fold convolution of F(é). In fact, let =,,=,,...,,,...denote the durations 
of the first, second, etc. services. The =; are independent and identically distri- 
buted with distribution law F(é). The probability that at least k services occur 
in time é is the same as the probability that the time span until the finish of the 
kth service does not exceed €, i.e., 


Pri, + Ey t+ E < Sh = PMO). 
Hence the probability of exactly k services in time € is equal to 


Pr{time required for at least k services < &} 
—Pr{time required for at least k + 1 services < č} 


and (5.2) obtains. Now F(€) is known explicitly: 


Š 
mo- fi 
0 
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Integrating the corresponding formula F“*+(é) by parts, we reduce (5.2) to 
Kk 
FXE) — FEE) = e™ n i 


By the law of total probabilities we have 


(oe) 3 éku 
a ug 
ree f ent gO, 


where H(€) is the distribution function of an interarrival period. The formula 
for a, can be derived directly. However, the method described above has 
independent merit of value in other contexts. 

Finally, P;o is the probability that all i persons present were served, which 
is equal to the probability that at least i would have been served if more had 
been present. It is most conveniently expressed in the form Pip = 1 — X2; Pij, 
so that 


ro a 0 0 O 
(5.3) 


where r; = 1 — dg — ay — +++ — Q;. 

We encountered the Markov chain with transition probability matrix 
(5.3) in Section 6, Chapter 3, and a rather complete analysis was made per- 
taining to its properties of positive recurrence, recurrence, and transientness. 
We proved, in particular, that if 


y ka, > 1 
k=0 


then the Markov chain process is positive recurrent and the stationary limiting 
distribution has the form 


m = (1 — čo), i=0,1,2,..., 
where & is the unique solution of f (čo) = čo (0 < éo < 1) for 
fO= Vac. 
k=0 
By the very meaning of a, we have 


expected interarrival time _ 1 
expected service time p` 


oO 
fy = Ë ka = 
k=O 


Therefore the process is positive recurrent if and only if p < 1. 


Waiting Time 


Now when/f‘(1) > Land the distribution function of the length of line approaches 
a stationary distribution we now determine under the condition of stationarity 
the distribution of the waiting time W prior to service. 

The probability of not waiting is clearly 


To = 1 — čo. 


If a customer arrives and finds n > 1 customers ahead of him, he must wait 
the sum of n independent identically distributed exponential service times 
before he can receive service. Such a sum is distributed according to a gamma 
distribution of order n with scale parameter u: Thus, 


t pnpn—1i -ut 


Pr{W < t|n ahead} = | HES e 


——_—— dt, n>1. 
0 T(n) 


Therefore, since 
Pr{n ahead} = x, = (1 — €,)&3, 


we have 


W(t) = Pr{W < t} = >) Pr{W < t|n ahead} Pr{n ahead} + To 
n=1 


00 n-1,—pt 


zage | 5 ere on = 
= (1 — &) ip Tay T+A- bo) 


= (1 — čo) + ofl — expl—mt(l — Co) I}. 


This distribution is a mixture of an exponential distribution with param- 
eter u(1 — čo) and a degenerate distribution whose only possible value is zero, 
the latter occurring with probability 1 — čo, which is the probability that a new 
arrival will not have to wait for service. The conditional distribution function of 
the length of wait, given that the server is busy, is then 


Qa) = 1 — exp[—pt(l — čo)]. 
6: Gamma Arrival Distribution and Generalizations (E,/M/1) 


This is a special case of the previous model which can be attacked by a rather 
elegant trick, possessing quite wide applicability in other contexts. Consider a 
one-server «jueueiny process with exponential service times with parameter u 


but with interarrival time distributed as a gamma distribution H(u) of order k, 
the density being 
Akut 1 e” Au 
huy=4 TR) ’ 
0, u <0. 


u > 0, 


The distribution function H(u) can be regarded as the distribution of the 
sum of k independent random variables all distributed exponentially (A). 
(The notation signifies that the parameter of the exponential is 4.) Therefore, 
we may reduce the problem to a Markov process by considering each arrival 
as consisting of k stages ,0, 1,...,k — 1, in each of which the customer waits an 
exponentially distributed time (A) before proceeding to the next stage. The 
physical arrival of a customer in the line corresponds to his reaching the kth 
stage. There is exactly one person in one of the stages 0, 1,...,k — lat all times, 
a new person entering stage zero just as his predecessor leaves stage k — 1. 

The state of the system is defined to be the sum of the stages of the people 
in it. Thus if the system’is in state nk + 1,1 < k,it means that n people are actually 
standing in line or being served and another is in the “Ith stage of arriving.” 
As a person completes service the state of the system decreases by k. 

We have thus defined a continuous time Markov chain whose infinitesimal 
generator is 


k 
-àA 24 0 0000 0 0 0 0-- 
0 AA 0000 0 0 0 0 
0 0-A 2000 0 0 0 0 

A=|/0 0 0 0 0—4 å 0 0 0 
u 0 0 000 -u+ å 0 0 
O u 0 00 0 0 —(u+4 å 0 
0 0 u 000 0 0 -—(™+ta a 


The equilibrium properties of this continuous time Markov chain can be 
determined. We do not enter into this analysis because as mentioned above this 
case is a special example of G/M/1, which was discussed in Section 5. The 
advantage in the above formulation lies in the fact that the time-dependent 
behavior of the process can also be determined exploiting the Markov character 
of the process. Its discussion is too advanced for this text. 

A. GAMMA SERVICE AND GENERAL INPUTT 
We may combine the techniques of the past few sections to determine the 
stationary characteristics of a single-server queueing process with general 


T The rest of Seetion-6 can be skipped on first reading without loss of continuity. 


508 18. QUEUEING PROCESSES 


input distribution H(v) and service distribution a gamma of order k and param- 
eter u. l 

Consider the service as consisting of k stages, 1, 2, . . . , k, in each of which the 
customer remains a length of time distributed exponentially (u). Upon com- 
pletion of the kth stage the customer is finished being served, and leaves. 

We may construct an embedded Markov chain whose transitions are 
effected at each arrival of a customer. Let the state of the chain during an 
interarrival time be indicated by kq — p + 1, where q is the number of people in 
the system and p is the stage of service of the person being served at the moment 
of the last arrival. Since k — p + 1 is the number of stages (counting the present 
one) that the person being served has to pass through, the state of the system, 
which is k(q — 1) + (k — p + 1) = kq — p + 1, may be interpreted to be the 
number of exponential waiting times which all the customers ahead of the 
new arrival must undergo before the new arrival can begin service. If q = 0, 
the state of the system is defined as zero. 

The one-step transition probabilities for the chain fall into several cases. 
For all i: 


G) Ifj>i+t+k, then P;; = 0, 
di) If j<i+k,j#0, then i+ k —j exponential waiting times have 
passed in one interarrival time and then 


e que 
Pi; =ù = i — 4H), 


where r = i + k — j. The derivation of this expression is identical with that of 
a, on page 505. 

(iii) Finally, P; 9 is the probability that i + k exponential waiting times 
have a sum denoted by S;,, not exceeding an interarrival time, 


1 


00 œ fv a T alc 
n= ; H(v) = 2 H(v). 
Pio |, Pr{Sis < 0} dH) f f ark p ŠPO 


B. STATIONARY PROBABILITIES 


When the traffic intensity 


_  E[service time] _ k 
~ Efinterarrival time]  uE[interarrival time] 


is less than 1, we expect that the probabilities of being in the various states 
will approach a limiting distribution. Such a stationary distribution is propor- 
tional to a nonnegative convergent series determined by the sequence x = 
(Xos X4,--.) satisfying 


x = xP, (6.1) 
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By analogy with previous models, a trial solution of the form 
Xi = Ai 


for some real number À suggests itself. The component equation of (6.1) for 
j = k simplifies to 


X= YAP, = y Anise- j = Şair, = A" y Mn, (6.2) 
i=0 i=j—k r=0 r=0 
or 
A = F(A), 
where 


F(A) = yan, = | e HLA) dH (v). 
r= 0 


It can be proved with the aid of Rouche’s theorem of complex variables that 
A* — F(A) has k roots, counting multiplicities for |A| < 1. 

Rouche’s theorem states in particular that, if f(z) and g(z) are analytic 
functions in a common domain D and | f(z)| > |g(z)| for z on the boundary of 
D, then 


f(z) and f(z) + g(z) 
have the same number of zeros counting multiplicities. (For a proof of this 
theorem the reader should consult standard books in complex variables.) We 
shall now apply Rouche’s theorem to the case of D = {z;|z| < 1 — ô, ô > O}and 
f(z) = 2, g(z) = —F(z). Indeed for |z| = 1 — ô, we have |zř = (1 — ô = 
1 — kô + o(ô). Now for |z| = 1 — ô, | F(z)| < F(1 — ô) (since F(z) is a power 
series with nonnegative coefficients). But 
F(i — ô) = F(1) — ôF'(1) + 0(6) 


as 6 > 0. Moreover, a direct calculation shows that 
X k 
Fd) =u | v dH(v) = — > k, 
0 p 


since we assumed p < 1. Hence it follows, provided ô is sufficiently small, that 
Iz*|>|F(z)|, Jz] =1—-6. 


By virtue of Rouche’s theorem, we conclude that z* and z* — F(z) have the same 
number k of zeros in {z;|z| < 1 — ô}. 

If the k roots are A,,A,,..., Ax then {x, = A}, will satisfy Eqs. (6.2) for 
any r(r = 1, 2,...,k). We might attempt to find a linear combination 


k 
Tey = AT + AAI He + AL, ye = 1, 
ee r=] 
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such that the system of equations x = xP is completely satisfied. Certainly the 
equations x; = )2.9 x;P; ; VU = 0,1,...) for j = k are satisfied for all choices of 
æ, since each sequence {A} for r = 1,...,k provides a solution and linear com- 
binations of solutions remain solutions. It remains to determine the constants 
01, %2,.-., % SO that the first k equations of x = xP will also be satisfied. For 
the case where all the A; are distinct, some algebraic manipulations lead to the 
explicit solution (the normalization y 2 -0 Ta = 1 is employed) 


isi a; A; 
a a,/(1 — Ai)’ 


Tn = n = 0, 1,... 


where 


k A; ) 
a= M 
aL 1 (; i Aj 


The details of the algebra and the modifications necessary when roots coincide 
are tedious and will not be given. 


C. WAITING TIMES 


We pointed out above that the state of the system as defined is the number 
of exponential (u) waiting times the new arrival has to wait. Therefore, if the 
state of the system is n > 0 the waiting time distribution of the person who just 
arrived is gamma of order n with parameter u. If n = 0, he does not wait. 
Therefore 


č © j-1,-pw 
me = PW sa = | S EE ydw t my l 


For the case of distinct roots we may substitute the expression of z; derived 
above and obtain 


1 č œ ie eal lee” k : | 
goa aol i gop Aed 


1 Ek 
= veal oa fı + bè ar exp[ —uw(1 = Ai] anh 


k 
BT (i + ire — exp[—pe(1 — aon) 


=1 [La A/(l — A) exp[— u0 — 4) 
Viet afl — Aj) 
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7: Exponential Service with s Servers (GI/M/s) 


We indicate the generalization of the preceding techniques in dealing with 
the case of one server by considering an s-server queueing problem whose 
input distribution function is H(v) and for which the distribution of service 
times is exponential with parameter u. (We assume that the service time distri- 
butions for all the s servers are identical.) 

As before, the process is not Markov, but we may investigate an embedded 
Markov chain. Let the transitions of the chain be effected at the instants of 
arrivals of new customers, and let q, the state of the system, be the number of 
customers, waiting and being served, that the last customer encounters on 
arrival. 

We may calculate P;; as follows: 


© Ifj>i+ 1, P; = 0 for alli = 0,1, 2,... 

Gi) Ifj <i+1<-s, everyone is being served, and there are exactly i — 
j + 1 departures during’ the interarrival period, where the probability of any 
given person departing by time t is 1 — e “. Thus 


P; = | Pr{i + 1 — j depart in time t{i + 1 present originally} dH(t) 


o 
2 | K ; "a — em myit tle Hl dH), on 
o J 


The integrand is the binomial distribution corresponding to i + 1 — j successes 
(completions of service) in an interarrival epoch. 

(ii) Ifi+ 1> j2 s,and i> s, all servers are busy throughout the inter- 
arrival period. Therefore 


P;; = Pr{i + 1 — j depart} = | Pr{i + 1 — j depart in time t} dH(t) 
0 


oO ,— ust iti-j 
= | dH: 

o(i+1-j)! 

(The derivation of the last identity is the same as that of (5.2). The distribution 
of the time of service completion now is exponential with parameter su since 
there are s busy servers in the present case.) 

(iv) Ifi+12>s>j, there will be m = i — s + 1 customers waiting as 
well as s being served at the beginning of the interarrival period; but n = s — į 
servers idle at the end. Let v.denote the time until none are waiting, i.e., the time 
for m people to be served while all s servers are working. Each service time is 
distributed exponentially with parameter sz, so v is distributed as a gamma of 
order m with parameter sj. Suppose the duration of service for the m customers 
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with all servers busy is v, while the remaining n customers are served in a period 
of length u — v, where u denotes the next interarrival time. Then 


Pi = | Pr{m + n people served in time u} dH(u) 


0 
o0 o0 — suv m,m-— 1 i 
= Pr{n people served in time u — v} fae ele dv | dH(u) 
o LJo (m — 1)! 
Aan (su)” oa m—1 „,— suv S —u(n—v)(s-n) —p(u-—v)\n 
ety v” te a e (1 — e #4)" dv dH(u), 
— 1): Jo Jo 


where the last identity follows by the binomial distribution as in (7.1). 


A. STATIONARY PROBABILITIES 


We expect that if the traffic intensity is <1, i.e., 


E[service time per customer when all servers are occupied] 
7 E[interarrival time] 
1 
= : : : <1, 
spE[interarrival time] 


then after a long time the probabilities of being in each state should stabilize. 
We look for a positive vector x = (x9, X1, X2, ...) Satisfying } x; < oo and 
x = xP. By comparison with the special case for one server, which was discussed 
earlier, we are led to consider a possible solution of the form 


x= (Bo, By. eas 1, a, a, .). 
The jth component equation (j > s) of x = xP is 


ive) 


00 
VxiPii= YL Pj= 
=0 1 


i=j- 


E a i a RE 
Xj= a a 
t 


t 


0 

i-s+1 
a ae ey 
eae 


1 
= is (emi dH(u). 
0 
This equation is of the form « = F(a), where 
F(a) = | e HHI -9 GA(y) 
0 


is a convex increasing function in (0, 1), with F(0) > 0, and F(1) = 1. The con- 
vexity of F can be verified by differentiating twice. Therefore, there is a solution 
a in (0, 1) if and only if F’(1) > 1. Since 


F'(1) = us fa dH(u), 
0 


this is just the criterion p < 1. 
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Having found the solution «, we may find the remaining components, 
B,-2, Bs—3>+++» Bo, from the recursion relations 


s-—2 00 

B= Ð BPyt+ X of SP, jf =0,1,...,8-2, 

i=j-1 i=s-1 
or 

Pas B; — XG? BiP — Eesi tP 

J7 P. ? 


j-1,j 


EE TE 


starting from B,_, = 1. Normalizing, we have the final probabilities 


T; = j 
i aat + 525 be 


B. WAITING TIMES UNDER CONDITIONS OF STATIONARITY 


The probability that an arrival does not have to wait for service is the probability 
that at the arrival instant at least one server is free, which is 


s—1 1 + Ds o , s-1 
W(0) = Prig <s- 1} = X n = Aa = 4 XB: 
2m (1 — a) 149g i Eo 
where 
1 
(1 — a)! + Y}: Br 
If the state of the system is n > s, the new arrival has to wait untiln — s + 1 
customers are served before he can be served. But since there are s servers 
working, the waiting time between completions of service is exponential (us). 
Thus his waiting time is a gamma distribution of order n — s + 1 with scale 
parameter ps and 


A 


Bs-1 =1. 


č a n—-s+1,, n—s,,— usw 
PriW<G=WO=wo+al ¥ P E aay 
0 n=s (n = s)! 
s= (3 
= a| EB + fansen aw] 
i=0 o 
| on Hse 2) 


t+ =o) YE? B: 


8: The Virtual Waiting Time and the Busy Period 


+ 


This section is devoted to a different approach to the problem of waiting times 
in the simple single-server queueing process with Poisson input and general 
service (M/GI/!). We shall illustrate the point of view by developing some 
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n(t) 


FIG. 1 


results concerning the busy period of this process, utilizing for this purpose the 
discussion of Section 3, Chapter 13. 

Since the queueing process as a whole is not Markov, our previous tech- 
nique was to consider an embedded Markov chain and analyze the waiting 
times of customers in terms of the embedded chain. However, if we consider 
n(t)[n(t) is sometimes interpreted as the virtual waiting time], the time a customer 
would have waited had he arrived at the instant t, then n(t) determines a con- 
tinuous time Markov process. For if t, and v, are the instant of arrival and service 
time of the nth customer, then for t, < t < ta+1, We have 


n(t) = [nta +) = (t = ta)]+Î 


and 
níta +) = nta —) + Vas 1 


[nt +) = lim y(t + £), and n(t—) = lim n(t — €)]. 
e0 elo 
To fix the notation, the input distribution is assumed to be exponential 
with parameter å and the service distribution is generally given by H(v). Typically, 
n(t) has the appearance of Fig. 1. It is clear that the future behavior of n(t) does 
not depend on anything previous to its present value. Indeed, the values t; 
being the successive events of a Poisson process, it follows that the time until 
the next arrival is independent of when the last arrival took place. 
Another interpretation for y(t) is that this is the time needed to complete 
service of all those customers who are present in the system at time t. The 
actual waiting time of the nth customer is n(t,,) = n(t, —). 


+ The symbol [x], is defined to be 


x if x>0, 
[x] = p if oxen. 
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Now if 
F(t, x) = Pr{n(t) < x}, 


it is possible to derive a differential-integral equation satisfied by F(t, x). 
This equation can be analyzed to study the properties of F(t, x). We proved 
in Section 3 that the waiting time of the nth person, 


F(x) = F(t, X), 


converges to a limiting distribution as n > oo. By use of appropriate renewal 
theorems the convergence of F(t, x) can also be proved and shown to possess 
the same limiting distribution as F,,(x). The development of this proposition is 
beyond the scope of this book. (See references at the close of the chapter.) 

The remainder of this chapter is devoted to a discussion of various random 
variables of interest connected with the process M/GI/1. 

Notice that if y(t) > 0 then the server is busy at time t and if y(t) = 0 then 
the server is idle at time t. Let 


Po(t) = Prin(t) = 9}, 


i.e., Po(t) is the probability that the server is idle at time t. 

The busy period is defined as the time interval during which the server is 
continuously busy. If n(0) > 0, i.e., the server is busy at time t = 0, then there is 
an initial busy period which ends when y(t) vanishes for the first time. Denote 
by G(x) the probability that the length of the initial busy period is < x. Following 
the initial busy period (if any) idle periods and busy periods alternate. The 
lengths of the busy periods following the initial busy period are identically 
distributed, mutually independent, random variables, since each subsequent 
busy period commences under identical conditions. Denote by G(x) the 
probability that the length of a busy period other than the initial one is <x. 
The idle periods are also identically distributed, mutually independent, random 
variables whose distribution function is exponential with parameter A. 

The first principal task is the proof of Theorem 8.1 below. It depends on a 
result of Section 2, Chapter 13, which we record here for convenience as Lemma 
8.1. 


Lemma 8.1. Let Xi, X2,- .-, Xn be nonnegative, exchangeable, random variables 
with sum x, + X2 + °°: + Xn = y. Let T4, T2,..., Tn be the coordinates, arranged 
in increasing order, of n points distributed uniformly and independently of each 
other in the interval (0, t). If {x} and {t,} are independent sequences, then 


l—yt if O<y<t, 


Pring tot mS a fork = 12... = do if y>t 


(8.1) 
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The proof of Lemma 8.1 and other of its applications are given in Chapter 13, 
Section 2. We need one more lemma. 


Lemma 8.2. Let X3, X2,- --, Xn be nonnegative, exchangeable, random variables 
with sum Xi + X2 +°- + Xn = t and let 11, 7,..., T%-1 be the coordinates, 
arranged in increasing order, of n — 1 points distributed uniformly and inde- 
pendently of each other in the interval (0, t). If {Xx} and {t,} are independent 
sequences then 


1 
Pry +--+ + X < tefork = 1,2,....n—- 1} = 7 (8.2) 
Proof. By Lemma 8.1 we have 
Prixa +e + Xk St fork = 1,2,...,n — 1| +--+ + Xn- = Y} 


A 1-ž if O<y<t, 
0, yot. 


Now by the law of total probabilities 


Prí% te tu STk k= 1,2,...,n- 1} 
t 
= [Pritt ns k= Leo Lli +e + Xn- = V} 
0 


x d Pr{y, + +++ + Xn-1 S Y} 


0 t 


=1-7 ("4 
t n 


since ¥;,..., Xa are exchangeable and their sum is t. This obviously reduces to 


t 
1 
= {( - 3) d Prix prs + Xa- = yp 1 =g Fla t+ + tea] 


Pr{y, +- + Xk < T for k = eee ES em a 
n 


Theorem 8.1. If n(0) = c (constant), then the probability that the initial busy 
period has duration < x is given by 


ms co A" x—c i 
Gx) = È zf eMC + yT dH) f x2, 
t JO 


n=0 


Ĝ(x)=0 if x<c, (8.3) 


where H,,(y) denotes the n-fold convolution of H(y), with the convention that Ho(y) 
is the distribution with unit jump at 0. 
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Proof. We make the calculation by conditioning in terms of the number of 
arrivals. The number of arrivals during the initial busy period may be n = 0, 1, 
2,.... If n = 0 then the initial busy period has length c and the probability 
that no customer arrives in the time interval (0, c) is e~ *°. This contributes the 
term for n = 0 in (8.3). If n > 1 then denote by T1, t2,..., 1, the arrival times 
and by 41, ¥2,---» Xn the service times of these customers. They must satisfy 
the conditions 


TSX te ty-aite for j=1,2,...,n, (8.4) 


where the empty sum is equal to zero. In fact, relation (8.4) asserts that the 
cumulative service times for the j — 1 customers that arrive after time 0 plus 
the cumulative service times of those customers waiting at time 0 exceeds the 
time of arrival of the jth customer, j = 1, 2,...,n. This condition plainly assures 
that the server stays busy at least until completing service for the nth arrival. Of 
course, Pr{y,; + X2 +++ + Xn < x} is H,(x). 

If y, +--+ +4, = y, then the length of the initial busy period is c + y 
and the probability that exactly n customers arrive during the time interval 
(0, c + y) is e *©*”[A(e + y)]"/n!. The arrival instants can be considered as 
the coordinates, arranged in increasing order, of n points distributed uni- 
formly and independently of each other in the interval (0, c + y) (see page 126, 
Chapter 4). Further 7,,7%2,...,%, are nonnegative, exchangeable, random 
variables. 

Now subtracting the inequalities (8.4) from y + c, we obtain the equivalent 
relations 


ytc Rey Hf eS j=1,2,... n. (8.5) 


Let tž+1-; = y +c-—t,;, and since x; + +++ + Xn = y we may rewrite (8.5) 
in the form 


r a aoe Bes j=1,2,...,n. (8.6) 


But the ty, ,_;, j= 1,..., are again clearly distributed like the n order statistics 
of the uniform distribution on (0, c + y). To convince ourselves of this we merely 
look at the values of t; from right to left, scanning the interval from c + y 
to 0. Symmetry considerations yield the desired conclusions. Moreover, Xn, 
An-15+++>Xn+1—-; are distributed jointly like x1, ¥2,...,%; because they are 
exchangeable. Hence the event (8.6) has the same probability as the event 


ZX% txt: +X, Jak 2o (8.7) 
Appealing to Lemma 8.1, we conclude that the probability of the event 
(8.7) is ; 
y c 


l| — = , 
cry c+y 
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Now putting these facts together with the aid of the law of total probabilities, 
we have 


n(0) = c, n arrivals in this ~ 


ioe} 
= Prs initial b iod < ; Bb to 
Gx)= 2 | ini AE AREE A % require total service time y 


n(0) = c, total service 
time of arrivals is y 


0 x>e rA" 
= | F eaen EFI tH) 


x Prf arrivals in the busy period } dH,(y) 


n=0 JO c+ y 
00 A" x—Cc 
=F 2f e tete + y)""' dH,(y). W 
n=0 n! 0 


Our next theorem yields the distribution function of a busy period other 
than the initial busy period. 


Theorem 8.2. The probability that a busy period other than the initial period 
has length < x is given by 


oo Ari x 
Gx)= > aI | e **y""1 dH), x>0. (8.8) 
: Jo 


n=1 


Proof. If we suppose that a busy period consists of n, n = 1, 2,..., services, 
then its length is %4 + X2 + -+ + Xa where X1, X2,- --, Xn are identically distri- 
buted, mutually independent, random variables with the distribution function 
Pr{y; < x} = H(x), i = 1, 2,...,n. In this case exactly n — 1 customers arrive 
during this busy period. Measure time from the starting point of the busy 
period and denote by T4, T2,...,T„-1 the arrival times. They must satisfy the 
conditions 


TS Xi te ty Ee e EN (8.9) 


If x, +--:+ Xn = y, then the busy period has length y and the arrival instants 
can be considered as the coordinates, arranged in increasing order, of n points 
distributed uniformly and independently of each other in the interval (0, y). 
Further, Xı,..., Xn are nonnegative, exchangeable random variables. If 
Xı ++: + Xn = y, then (8.9) has the same probability as 


Xi Heee tM STk k= 1,2,...,n — 1, (8.10) 


for in (8.9) we can replace x; by X„+1-;j and t; by y — tp- j = 1,2,....n—1 
without altering the probability of the event. By Lemma 8.2 the probability of 
the event (8.10) or (8.9) is 1/n. Since Pr{y, ++- + x, < y} = H,(y) and the 
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probability that during the time interval (0, y) exactly n — 1 customers arrive is 
e~*(Ay)"" 1/(n — 1)!, we can apply the law of total probabilities to obtain 


E © 1 x -y (Ay)"* 
G(x) = p E f e (n— Di MEY dH,(y), (8.11) 


which was to be proved. W 


Problems 


1. Show for the M/M/s system that the stationary queue size distribution {p,,n = 
0, 1, 2,...} is given by 


s s-1 iy-1 
i Í Go ax 5 eP} i 


si(l—p) i% i! 


S n 
iy toes l< <S, 

n! 

Pn = s 
PoP" > s<n< œ, 


where p = A/su < 1. Let Q = max(n — s, 0) (n = 0, 1, 2, . . .) be the size of the queue not 
including those being served. Show that 


vo (sp)'/ i! i 
X= [Cso)/i!] + EspYp/s!0 — p)’ 


GD E[Q] = (1 -yA — p). 


2. Compare the M/M/1 system for a first-come first-served queue discipline with 
one of last-come first-served type (for example, articles for service are taken from the 
top of a stack). How do the queue size, waiting time, and busy period distribution differ, 
if at all? 


G) y= Pr{Q = 0} = 


Answer: Queue size and busy period do not differ but the waiting time distributions 
differ. Why? 


3. Consider the M/M/1 system with queue discipline of last-come first-served type. 
Let X(t) be the queue size at time t. Show that the process {X(t), t > 0} is a birth and 
death process and determine its parameters. 


Answer: A, = A, My = H 


4. Consider an infinitely many server queue with an exponential service time distribu- 
tion with parameter u. Suppose customers arrive in batches with the interarrival time 
following an exponential distribution with parameter 4. The number of arrivals in 
cach batch is assumed to follow the geometric distribution with parameter p(0 < p < 1), 
ien Pr{number of arrivals in a batch ig k} = poi = py (k = h 2...,). 
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Formulate this process as a continuous time Markov chain and determine explicitly 
the infinitesimal matrix of the process. 


Answer: Q = |\q;;l| where 
qi i-1 = İM, i2l, 
q= APA -p jf >i, 
qi; = 9, j<i-1, 
and 


qi = — >, fij- 
j#i 
5. (continuation). Determine the probability generating function x(s) of the equi- 
librium distribution of the process. 


Answer: 


-p —àjpu 
= |1 +—— (1 -s : 
n(s) | P | 
6. (queueing with balking). Customers, with independent and identically distributed 
service time distribution H(x), arrive at a counter in the manner of a Poisson process 
with parameter 4. A customer who finds the server busy joins the queue with probability 
p(0 < p < 1). Derive the transition probabilities of the Markov chain embedded at the 
points of departure of customers. Can you find the limiting distribution of queue size? 


Answer: 
Po Pi P2 
Po Pi P2 j 
© ape Ax)? í 
P=|0 po Pi ; n= [ as ay ee 
0 0 Po 9 i 
Sat (1 — P)K(sXs — 1) 
K(s)= ) pjs’,  n(s)= A 
» ‘ S K(s) 
where. 


p=dadsp<1l and a= [xda < ce 
0 


7. Consider the M/M/1 queueing model with balking as in Problem 6. Now we assume 
that the interarrival distribution is exponential with parameter A. The service time dis- 
tribution is exponential with parameter u. The balking parameter is p as in the example 
above. Formulate this model as a birth and death stochastic process. 
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Answer : 
Ag = A, 4, = Ap for n21, m, = 4. 


8. The following two birth and death processes (cf. Section 4, Chapter 4) can be viewed 
as models for queueing with balking. 
(a) First consider a birth and death process with parameters 


A, = Aq", 0<q<1, A>0 (n=0,1,2,...), 
Hy = h, n >00, 
Ho = 0. 

(b) Let the parameters be 


A 


1, =—— = =1,2,...), = 0. 
rear Hy =H (n ) Ho =0 


Determine the stationary distribution in each case. 


Answer: (a) Pm = polA/wy™g™"")/2 form > 1. (b) Pm = po(A/w)"(1/m!) form > 0, 
whence pọ = e 7! 


9. Consider the problem of pedestrians wishing to cross a one-way road at a given 
point. Suppose vehicles (of zero length), which have the right of way, pass the point in 
the manner of a Poisson process with parameter u. All waiting pedestrians will cross the 
road whenever a gap of at least T seconds appears for the first time between vehicles 
in the road. What is (i) the distribution of the waiting time for a pedestrian who arrives 
at an arbitrary time and (ii) the distribution of the time from the end of one possible 
pedestrian crossing point to the beginning of the next? Give answers in terms of Laplace 
transforms. Find the mean wait of a pedestrian. 


Answer: Both (i) and (ii) have the same Laplace transform 
L(s) = [(u + s)e~*7/(s + pe“ ™*97)]; 


eT — (14 uT) 
ae ee 


mean wait = 


10. (M/G/œ system). Suppose that at a counter there are infinitely many servers, 
so that there is no waiting time for customers, but we are interested in the number of 
busy servers. Customers arrive at times corresponding to a Poisson process with param- 
eter A. The service times for the customers are independent and identically distributed. 
Let H(x) denote the service distribution. If there are initially no customers present 
find: 

(i) P(t) = Pr{at time t there are exactly k customers being served}; 

(ii) limys. P(t) = Pp, (Assume a = fÈ x dH(x) < œ.) 


Hint: Use the law of total probability and the fact that, given n arrivals by time t, the 
instance of arrival are distributed like the order statistics of n independent observations 
from the uniform distribution on (0, r}, 
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Answer: 


t t k 
O P,(t) = (>f af [i H| ixt) g [1 — H(x)] ix) jes 
0 o 


k © 
Gi) P, = e7% A y iga | x dH(x). 
k! 0 


11. In an (M/G/co) queueing system customers arrive in a Poisson process with 
parameter À and have identically and independently distributed service times with dis- 
tribution function H(x). Initially there are no customers being served. Show that the 
probability of n departures by time t is given by 


ik | faw auf exp| -4 faw aul. 
n! 0 o 


12. Consider the queueing process of Problem 11. Show that the probability g(t, T) 
of no departures in (t, t + T) satisfies the recursion relation 


ot, T) = [eer cH) + 1 — H(t + T)]e(t, T) dt 
o 


t+T 
+f de *[1 — A(T +t—D] 90, T+t—adr+e tD, (*) 
t 


Hint: Examine the possibilities at the instant of the first arrival. 


13. Using the result of Problem 12 prove that 


ot, T) = exp| = f HO a|, | 


Hint: Derive a first-order differential equation (in the variable t) for g(t, T) and solve. 


14. (continuation). Let ọ,„(t, T) denote the probability of n departures in time 
(t,t + T). Derive an integral equation for p, in terms of @,,_, in the spirit of (*). Then 


show that 
1 t+T n t+T 
P(t, T) = zi hf H(é) a| exp| -a f H(é) a|, 


15. Problem 10 was concerned with an infinitely many-server queue with Poisson 
arrival pattern. Consider the dual system G/M/co where the interarrival times are 
independent and identically distributed with density function h(x), and the service 
times are independent and exponentially distributed with parameter u. There are in- 
finitely many servers. Determine the transition probability matrix for the embedded 
Markov chain whose state variable 4, is the number of busy servers at the times of 
successive arrivals. 
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Answer: 
: : i+ 1 Pin bee ta E 
Pr{tna, = jli = } = Pi = j as Eee Ml delete |b 9) dx. 
0 


16. In the M/G/1 system let B,, B,,... be a sequence of independent and identically 
distributed random variables whose distribution function B(x) coincides with that of 
the busy period of the system. Suppose that the first customer of a busy period has 
service time X (with distribution function H(x) = Pr{X < x} and that n other customers 
arrive during his service period. Show that 
B(x) = Pr{X + B, + Bp +- +B, < x}. 
From this establish that the Laplace transform 
Be) = | e~ °” dB(x) 
0 


satisfies the functional equation 
BO) = (0 + AC. — BO), 


where 
W(@) = nee dH(x). 
0 


Use this result to find the mean duration of a busy period. 


Hint: The busy period is independent of the mode of service. Suppose (as n = 0 is 
trivial) that n > 0 customers arrive during the initial service. With the first new arrival, 
begin another busy period; after completing that busy period go back to the second 
arrival of first service period and start another busy period with this customer; repeat n 
times. 


Answer: 


a oO 

Mean length of busy period = TEEF. a= | x dH(x). 

17. Under the same conditions as in Problem 3 with A < u, consider a customer just 
arriving and compute the probability that exactly n other customers are served during 


his waiting time, given that he does not find the server free on his arrival. 


Hint: Imitate the method of Problem 16 to show that the probability generating 
function g(s) of the number of customers served during a busy period satisfies the func- 
tional equation g(s) = us[u + A — Ag(s)]7?. 


vase a 4u eoi 
o= E -v UF a J” 


I8. Consider a queueing process where customers are arriving regularly at times 
n/À n = 0, 1, 2,.... Assume that the service time X, for the jth customer is exponentially 


Answer : 
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distributed with expectation 1/u. Assume À > u. Determine the probability that the 
server stays busy for an infinite length of time when there is one customer in the queue 
at time 0. 


Hint: Show that the desired probability is 
Pr{X, + X, +- + X; >i for alli = 1, 2,...}, 


where X; are independent r.v.’s with exponential distribution with parameter A/p (use 
Lemma 8.1 in the appropriate manner). 


Answer: 1 — p/A. 


19. (preemptive priority queueing). Consider a single-server queueing process that 
has two classes of customers (priority and nonpriority) with independent Poisson 
arrival rates with parameters 4, and A,(A, + 4, = 1) and with service times independent 
and exponentially distributed with parameters u, and u3, respectively. Within classes 
there is a first-come first-served queue discipline and the service of priority customers is 
never interrupted. If a priority customer arrives during the service of a nonpriority 
customer, then the latter’s service is immediately stopped in favor -° the priority cus- 
tomer. The interrupted customer’s service is resumed when there are no priority custo- 
mers present. Let p,,,, be the equilibrium probability that there are m priority and n 
nonpriority customers in the system. Equilibrium is achieved when p, + p2 <1 
(Pı = A1/bMy, P2 = 42/2). Establish that the pm.» satisfy the system of equations 


[A =F Ay + Hw, = Smo) aa u(1 s Òno)Ômo]Pm, n 
= Ai Pm- 1,n + À2Pm.n-1 + UiPm+1,n + U2ÔmoPm,n+1 (m, n= 0, 1, 2, a 


where 6;; is the Kronecker delta, and where it is understood that any p with a negative 
suffix is zero. Using this equation show that the mean number of nonpriority customers 
is 


X YP nen = p2 E H2P1 | 
M=0 n=0 


1— pı— P2 Hi — pı) 


20. Show for the M/M/1 queueing process in a stationary state that the distribution 
of time between successive departures has the same (exponential) distribution as the 
interarrival time distribution (see also Problem 10, Chapter 4). 


21. Customers arrive in a queue with a general independent interarrival time distribution. 
Examine the structure of the queue, specifically at regeneration points, of the following 
two systems: (i) there are s servers and the same exponential service time distribution 
for each server; (ii) there is a single server and an Erlangian service time distribution. 


22. Consider the following generalization of the GI/G/1 queueing system with inter- 
arrival distribution function A(t) and service time distribution function B(t), with finite 
means a and b, respectively: A customer who arrives to find the server idle waits a 
time with distribution function V(t) and finite mean v before commencing service. 

If F,,(x) is the distribution function of the waiting time for the nth arrival show that 
the limit F(x) = lim,..,, F,,(x) exists and 

(i) ifb — a > 0, then F(x) = 0; 

(ii) ifb —a <0, then F(x) is a distribution function. 
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23. We extend the idea of Problem 9 to a junction of two one-way one-lane roads 
A and B with traffic in A having absolute right of way; there is a stop sign in road B. 
As before, vehicles in A pass the junction in the manner of a Poisson process with param- 
eter u, and vehicles in B arrive at the junction in the manner of a Poisson process with 
parameter À and queue up, waiting to enter the junction. When a road B vehicle reaches 
the head of the queue it waits until the first gap between A vehicles of length at least T 
appears in the A traffic, and it then takes time T to enter the junction. Find the prob- 
ability generating function of the distribution of the number of B vehicles in the queue 
when the system is in a stationary state, and find the expected stationary queue size. 


Hint: We have an example of the M/G/1 queueing system, and it is sufficient to find the 
“service time” distribution of road B vehicles. 


Answer: 

_ (= pis — I)K(s) — Ba 
es eager es ee ge 
~ _ a -0x PER (u + Ae UFO? 

B0) = | e~™®™ dH(x) = O pe COT: 
NOTES 


The literature of queueing theory is voluminous. An elegant monograph reviewing 
this theory and its applications is that of Cox and Smith [1]. 

We also direct the student to the advanced books by Takács [2] and Riordan [3]. 

A compendium of results on queueing theory is contained in Saaty [4]. This reference 
also includes an extensive bibliography. 

Applications to congestion theory and telephone trunking problems can be found 
in Siski [5]. 

Some special mathematical aspects of queueing theory are developed in the mono- 
graph by Beneš [6]. 

An updated treatment of queueing processes of wide scope in applications is the 
recent book of Kleinrock [7]. The intriguing topic of network queues with its implica- 
tions is amply discussed therein. 
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MISCELLANEOUS PROBLEMS 


Several of the following problems, as well as some from preceding chapters, are 
based on research papers of the recent literature. As noted in the preface, we regret- 
fully cannot present the many relevant references, confining ourselves to appropriate 
books cited at the end of each chapter. 

A Star-designated problem is more difficult, although accessible by the methods 
we have presented. 


1. Consider standard Brownian motion in n dimensions starting at the origin. Let S be 
the first passage time to the surface of the unit sphere and let X(S) be the position at that 
moment. Establish the “intuitive” proposition that S and X(S) are independent random 
variables. 


2. For Brownian motion in three dimensions show that the occupation time of the unit 
ball has the same probability distribution as that of the first passage time out of the unit 
circle for Brownian motion in one dimensions. 


*3. Consider a standard Brownian motion {B(t)} and for a time point t define 


b: = sup{s < t; B(s) = 0} = the last zero prior to t; 
yı = inf{s > t; B(s) = 0} = the first zero after t. 


It is well known (see Chapter 7) that Pr{0 < $, < t < y, < œ} = 1. The interval (f,, y,) 
is called the excursion interval straddling t and {| B(s)|; 8, < s < y,} is called the excursion 
process. For a > 0 and e > 0 define 


Sta D= | taas (lBON ds 


Be 


which is the amount of time the excursion process spends in [a, a + £). Show that 
lim,„jo &7 ‘S(t, a, £) = S(t, a) exists, where S(t, a) is called the local time of the excursion 
process for the level a. [Show that the moments of ¢~ 'S(t, a, ©) converge as e | 0]. Prove 
that 


lim Efe OY] de 24) 2 


afd 
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4. Consider a regular diffusion {X(t)} on I = (l, r) and let T(a) be the hitting time to a 
state a, and let T(a, b) = T(a) A T(b) be the hitting time to aor b. Forl<a<x<b<r 
define the Laplace transforms 

Pax) = Ele OXO = x], Pax) = Ele *7™” | X(0) = x], 


etc. Establish the following relation between the Laplace transform ¢,,(x) of the two-point 
hitting time and the Laplace transforms for several one-point hitting times: 


Px) Tpl) — 1] + ppb) — 1] 
PAb)pr(a) — 1 l 
Hint: Let A = E[e7*T®L]X(0) = x] and B = E[e~?71,| X(0) = x] where I, = 1 — I, 


is unity if T, < T, and zero if T, < T,. Show that 9,(x) = A + Bg,(b) and p(x) = Ag,(a) 
+ B. Solve for A and B and then g,,(x) = A + B. 


Pap(X) aco 


5. Consider a regular diffusion {X(t)}, let T, be the hitting time to a state a, let t,(x) = 
E[T,| X(0) = x], and let z,(x) = Pr{T, < T,|X(0) = x}. Establish the formula 
Talb) + T(x) — TeX) 


S ITE 


Hint: : 
tx) = E[min{T,, Ty}|X(0) = x] + n(x)ta(b) 
and 
t(x) = ELmin{T,, Ty}|X(O) = x] + 14(x)t,(Q). 
Subtract and use m(x) = 1 — m(x). 


6. Let T be exponentially distributed with parameter p > 0 and consider the observation 
process 


ae for O<t<T, 
Y(t) = 
(t — T) + Bit) for T <t, 


where B(t) is standard Brownian motion. In a quality control model, T represents the 
unobservable time that an undesirable disturbance enters a system causing a displacement 
in the infinitesimal drift from 0 to 1. It is desired to detect the disturbance as quickly as 
possible after it occurs. Introduce the posterior probability that the disturbance has oc- 
curred given the observation process, 


X(t) = Pr{T <t|Y(s),0 < s < t}. 
Formally show that {X(t)} is a diffusion process on [0, 1] having infinitesimal coefficients 
Mœ) =- xip- x- x), x) = x — x)’. 


Hint: Conditioned on X(t) = x, the observation increment AY = Y(t + At) — Y(t) is 
normally distributed with mean At and variance At with probability x, and is normally 
distributed with mean zero and variance At with probability 1 — x. The conditional 
density for AY is 


f(Ay) sie d {ce 7 DIAD = (ANP rat +- xe T 2AM TAL, 


/2n( Al) 


MISCELLANEOUS PROBLEMS 529 


From Bayes rule, g(x, Ay) = Pr{T < t|X(t) = x, AY = Ay} is given by g(x, Ay) = 
xf, (Ay)/f(Ay) and the X(t + At) = q(x, Ay) + [1 — q(x, Ay)]p At + O(At)?. Now evalu- 
ate E[AX] and E[(AX)?] carrying terms up to O(At)?. 

7 (continuation). Suppose that action is taken with respect to the process whenever 
X(t) > &* for some critical value €*. A “false alarm” occurs when action is taken where 


no disturbance has occurred. Give a conditional probability argument that Pr{false 
alarm} = &* under this control rule. 


8 (continuation). The likelihood ratio is the statistic Z(t) = X(0)/[1 — X(t)]. Show that 
Z(t) is a diffusion process for which 
uz) =(L+ 2p t+? —2/i +2), =z. 


9 (continuation). Since the situation where disturbances are rare is of interest, we 
consider the case p — 0, first forming the process W,(t) = Z(t)/p. Show formally that W(t) = 
lim-o W,(t) is a diffusion process on I = [0, œ) with infinitesimal coefficients 


Hyw(w)=1—w, ohw) = w?. 


*10. Consider the recursion formula 


1 1 
Ry — HNP = Hy SOE) + ee oN + AED — EN” 


where X) = 0 and 7%") are independent and identically distributed with mean O and 
variance 1. Show informally that X{¥}, > X(t) where {X(£)} is a diffusion with infinitesimal 
parameters p(x) = f(x) + Ag(x)g'(x) and o7(x) = [g(x)]?. 


11. Let g be a bounded and continuous function on (0, 00) for which 


1 T 
lim ral g(s)ds = A #0. 


T> 0 (0) 


Let Y(t) be a stationary Ornstein-Uhlenbeck process and for each € > 0 form the weighted 


integral 
1 fs s 
B=- | g> Y(> 
o= fea)” 


t/e2 
=g g(v) Y (v) dv. 
0 
Show that B,(¢) is a zero mean Gaussian process and that B,(t) converges to standard 
Brownian motion as e | 0. 


Hint: The B,(t) constitute a family of Gaussian processes. So show that the covariance 
matrix for any finite set of time points converges. 


12. Consider the diffusion on I = [0, 1] with u(x) = —y.(1 — x) + yıx and o?(x) = 
2Bx(1 — x). Determine the spectral expansion of the transition density when 0 and 1 are 
entrance boundaries. 

Hint: Identify the eigenfunctions and cigenvalues of 


Ax = vp" + foe nd = x)]9! = Ae 
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subject to the boundary conditions 


d 
dS 


=a- fe = 0 


x= 1— x= 1— 


so 2 
x=0+ dS 


= yn @ 
x=O+ dx 


as the Jacobi polynomials 

p(x) = PrP — 2x) a=y—-1, B=y—-1 
and eigenvalues 1, = Bn(n + y, + y2 — 1). 
Answer: 


p(t, x, yY) = MYTA — yy"! Ye ONP 


n=0 


_ T(n + y(n + 91 +92 — 1) 
‘ Tan+y,)0n + 1) 


where M is a suitable constant. 


(2n +y + 72-1), 


13. Compare the Ito and Stratonovich solutions of the stochastic differential equation 
dX(t) = X(t)[1 — X(t)] dB(t), where B(t) is standard Brownian motion. 


Answer: The Ito solution is the diffusion process on I = (0, 1) with infinitesimal param- 
eters u(x) = 0 and o?(x) = x?(1 — x)*. Both boundaries are natural. The Stratonovich 
solution is the diffusion process with infinitesimal parameters p(x) = x(1 — x) — x) 
and o?(x) = x?(1 — x). 


14. Consider a diffusion on [0, 00), o?(x) = ø and p(x) = px, u > O where 0 is considered 
a regular exit boundary. This process is called by some business analysts a compounded 
Brownian motion, where x can be interpreted as the assets which increase through interest 
earnings at a rate y modulo Brownian motion fluctuations. 

Find the probability u(x) of ruin (i.e., the probability that the process reaches 0 from an 
initial level x > 0). 


15. Consider the k-gene haploid selection—mutation diffusion model analogous to 
Example VI in Section 2 of Chapter 15. This is a diffusion on the k-dimensional simplex I, 
T= {x = (Xp... Xk) X, SOX, + + 1 < 1}, 

having diffusion coefficients 
HX) = Vi = Or H+ + YW + OL — xi) + 2ox(xt +--+ + xk) — Xi 


and yh 
x(1 — x), i=); 
—Xj{Xj, ifxj. 


cx) = i 


Here y; > Ofori = 1,...,k and x, = 1 — (x, +--+ + xXp-1). Show that the density 


(x)= (TL?) exp(—0 xt) 


i=1 i=1 


satisfies the forward Kolmogorov equation stationarity condition 


2 [ai(x)o(x)] = ed 


2 


=0. 
l Ox, Ox, im ax, [iu(xe(x)] 


1 
2; 
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16. Consider for standard Brownian motion {B(t), t > 0} the random variable U as the 
occupation time of the positive axis taken until B(t) first attains the level 1. Let T* be the 
first passage time to the exterior of [—1, 1]. Show that U and T* have the same distribution. 


17. Consider Brownian motion {X(t), t > 0}, with negative drift parameter c < O and 
variance coefficient o°, ie. X(t) = oB(t) + ct, B(t) standard Brownian motion. Define 
M(t) = sup{X(s);0 < s < t}. Determine the limiting distribution lim, ~ Pr{M(t) < x}. 
Answer: exponential distribution of parameter o?/2|c|. 


18. Let {X(t), t > 0, X(0) = 0} be Brownian motion with drift, i.e., with infinitesimal 
parameters u(x, t) = ct, o?(x, t) = o? so that X(t) = oB(t) + ct. Show that for fixed s 


max (M(t) — X(t)) with M(t) = max X(t) 


O< tks Os tSt 


has the same distribution as maXo<;<;s Y(t), where Y(t) is Brownian motion process with 
drift parameters —c, that is, 


Y(t) = oB(t) — ct 
and exhibiting a reflecting barrier at 0. 
19. Let B(t) be standard Brownian motion. For fixed s > 0 determine the joint density of 


M(s) = max B(t), O(s) = {the first time value 0 where X(0) = M(s)} 


O<tss 
and the endpoint value of B(s). 
Answer: 
p(@, y, x) dð dy dx = Pr{0 e d0, M(s) e dy, B(s) € dx} 
1 Pis 2 yz 
_! w=» R oy 0-x) 
x [O(s — OF”? 20 2X(s-— 8) 


20. Let X(t), t > 0 be Brownian motion with drift (mean coefficient ut and variance 
coefficient o7t. Let M(t) = maxy. ,<, X(t) and T, the time of first passage through level 
a > 0. Prove 


Pr} max MOS 2 - AU) 


0< ts Ta 


jad ay as, 0<0<s, x<y. 


< y|X(0) = o} = apf- - (e? yh 


*21. Consider two types of objects evolving according to the stochastic differential 
equations 


dL = La, dt + /ajL? + bîL dB", dM = Ma, dt + \/a3M? + b3M dB”, 
where B" and B®’ are independent Brownian motions. Let N(t) = L(t) + M(t) be the 


total population size, and X(t) = L(t)/N(t) be the fraction of type L. By use of the vector Ito 
transformation formula, derive a stochastic differential equation satisfied by X(t). 


22. Let {X,} be a Markov chain with transition matrix P = ||p;;l|. Define g(i, ‘i) = 
din 0 pi) < æ. For a fixed state j, show that Y, = g(X,,,) defines a nonnegative (possibly 
infinite valued) supermartingale with respect to {X,}. 


Him: Use the identity G = I + PG.. 
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23. Let Ti, T3, ... be the times of events in a Poisson process of rate A. Independently, 
let X,, X2,. . . be nonnegative independent identically distributed random variables having 
distribution function F. Determine the distribution of Z = min{T, + X,, T, + X2,...}. 


Hint: Condition on the number of events of the Poisson process up to time z. 
Answer: Pr{Z < z} = 1 — exp(—A fz, FG) dy). 


24. Let X,, X,,...be independent random variables with common distribution function 
F(t). Given that lim, [1 — F(t)] = 0, prove that 


X, 
max —=-—0 in probability. 
1<i<n Jn 


Hint: Fix £ > 0. Let a, = Pr{max,<;<, (Xj/n) < £}. Then 
an = F"(e/n) = exp[—n log(1 — (1 — Fle\/n)J] +0 as nov. 


25. Consider the following simplified model of neuron firing. Electrical impulses arriving 
along the input fibers of a nerve cell can be one of two types, either a stimulus or an inhibitor. 
If one or more inhibitors arrive immediately preceding a stimulus, the cell does not respond. 
Otherwise it responds to the stimulus. Assume the stimuli arrive according to a Poisson 
process with parameter 4 and the inhibitors according to a Poisson process with parameter 
u, independent of each other. Calculate the Laplace transform of the response distribution 
if at time t = 0 we begin with a stimulus and thus a response. 


Answer: H(0) = AA + OLOO + A+ p) + AA + 0]. 


26. Consider the following generalization to variable population sizes of the basic 
Wright-Fisher genetic model. Let M„ > 0 be a random variable denoting the population 
size at time n and assume that there are two types, A and a. Let X,, be the number of A- 
types at time n. Assume (a) the process {(X,,, M„)} is Markov and (b) the distribution of 
X41 given M,, M,,, and X, is binomial with distribution 


j = Mnsi-j 
Mavi) (U +2) Me , 0<7< Myon 
j M, +0X,) \M, + 0X, 


where o > 0 represents a selection parameter. Show that fixation (X, = 0 or X, = M,,) 
occurs with probability 1. 


Hint: Show that {Y, = (X,/M,)} is a submartingale and that Y = lim Y, satisfies 
ELY( — Yy(i + oY)] = 0. 


27. Ina population of two types, let X,, be the number of type A and Y, the number of 
type a in the nth generation. Assume that M, = X, + Y,, the total population size, grows 
deterministically and the X,,,, is binomially distributed with parameters p,,, = X,/M, 
and M,,4,, given X,. Start with Xo = Yo = 1. Show that fixation (X, = 0 or X, = M, 
for some n) occurs with probability 1 if and only if $œ, 1/M, = ©. 


/ 


Hint: Z,, = X,/M, is a bounded martingale, converging to z, say. Then 


h, = E[Z,(1 — Z,)] > h = E[Z(1 — Z)]. 


1 ati l 
=h{1-— )=h I= h 
Mei HM My ) of ( 2 


But 
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28. Birds arrive at an indefinitely long telegraph wire, their positions behaving like points 
of a line Poisson process with rate A. Independently insects lands on the wire, their positions 
being those of an independent line Poisson process of rate u. A bird can eat any insect closer 
to itself than to any other bird. Find the distribution of the meal size of a bird. 


Answer: Probability of meal size consisting of k insects is 


u k 42 
er (>; + :) (24 + pw)? 
*29. Consider a continuous time birth and death process X(d) with infinitesimal birth and 
death rates {A,} and {y;} for i = 0, 1,... (uo = 0). Let P,;(t) be the transition probability. 
Consider two particles independently undergoing the process X(t). the first starting at i, 
the second at j. Let R;(t) be the probability density function for the time T that the particles 
first coincide (simultaneously occupy the same position). Show that 


R;{t) = È (Ak + Hk+ DUP (OP i+ 1(t) — P+ 1(7)P (T). 
k=0 


30. A discrete-state continuous time Markov process with transition function P;({t) 
is called symmetrizable if there exists «; > 0 for which «; P;,(t) = «;Pj(t) for alli, jand t > 0. 
(Every birth and death process is symmetrizable with «; = z; of Section 5, Chapter 4). 
A continuous-state process with transition density p(t, x, y) is symmetrizable if there exists 
a(x) > 0 for which a«(x)p(t, x, y) = a(y)p(t, y, x) for all x, y and t > 0. Show that a regular 
diffusion process on (l, r) is symmetrizable with the choice a(x) = m(x), the speed density. 


31. Let {X,,,n => 0} bea finite time-homogeneous aperiodic Markov chain and {Y,,n > 1} 
a sequence of identically distributed random variables which are mutually independent 
and also independent of the Markov chain, with Pr{Y, = r} = a, for r = 1, 2, ... with 
ye, a, = 1. Let So = 0, S, = Y, + Yo +- + Y, (n= land Z, = Xs, (n= 0). 

(a) Show that {Z,,n > 0} is a Markov chain. 

(b) Show that its transition probability matrix is given by Q = } <, a, P", where P 
is the transition probability matrix of {X „}. 

(c) Show that if P is irreducible, so is Q. 

(d) If P is irreducible and recurrent, then show that {Z,,} has the same stationary 
distribution as {X,}. 


32. Let X,,..., X, and Y be random variables having finite second moments. Let Z 
be the minimum mean square error linear predictor of Y given X,,..., X, and for 
k =1,...,n, let Ê, be the minimum mean square error linear predictor of Z given X,,..., 
X,. Show that Ê, = ¥,, where Ê, is the minimum mean square error linear predictor of Y 
given X,,..., X,.(Roughly speaking, a best predictor of a best predictor is a best predictor.) 


33. Suppose that X;, X2,...are independent and identically distributed random variables 
having the distribution function F and that N is a positive integer-valued random variable, 
independent of X, X2,..., and having the generating function 


Ws) = ys Pr{N = k}. 
k71 


With U = min{X,..... Xy},and V = max{X,,..., Xy} show that 
Priv = o} = yE], 
PU sujet pE FDN 
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and : 
Pr{u < U,V < v} = W[F(v) — F(u)]. 


34. A particle starting at the origin moves equally likely in any direction on the lattice 
of integer pairs. Letting the successive coordinate be denoted by (X,, Y,), compute the 
generating function g(@) = E[0’™™] where T(a) is the first time n that X,, = a. 


35. Suppose X,,..., X,, are independent exponentially distributed random variables with 
parameters 1,,..., 4,. Assume A, # A, unless j = k. Set T, = X, +-+- + X, Establish 
the formula 


Pr{T, > t} = Ay per Fees + Anne i" >O, 


where 
A js Ayes Anatase An 
AL = AD Ana — AD Aaa = Ad) An = Ay) 
re f 
= M, 
1<j<n Aj — Ak 
J#k 


Hint: Work with the probability density functions. A calculation can be avoided by using 
a symmetry argument. 


36. Let {X(t), t > 0} be a pure birth process with distinct parameters 44, 23, ... and 
starting from X(0) = 1. Verify the marginal probability distribution 


Pr{X(t) = n} = Aice Ap- Binet Fe + Bane i], 
where 
Bin = UEA = Ad Aran — AA — Aid An = Ad): 
37. Let {X(t), t > 0} be a modified pure birth process for which 
Pr{X(t + h)= n+ 1X) = n} =A,h + olh), 
Pri X(+ h) = O[X() = n} = 0,h + oh), 
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and 
Pr{X(t + h) = n| X(t) = n} = 1 — (ån + Oh + olh). 


Assume that A) = 0 so that 0 is an absorbing state or trap, and suppose X (0) = 1. Establish 
the formula 


Pr{X(t) =n} = }, Crn expl—(A + 4), 
k=1 


where 


A 
A lI A a ey 
nizien (Ay + 8, k P) 


lth 


the formula holding whenever A, + 0, # A; + 0; unless | = j. 


38. The “Peter principle” asserts that a worker will be promoted until first reaching a 
position in which he or she is incompetent. When this happens, they stay in that job until 
retirement. Examine the following single job model of the Peter principle: A person is 
selected at random from the population and placed in the job. If he or she is competent, 
he or she remains in the job for a random time having cumulative distribution function F 
and mean yp and is promoted. If incompetent, this person remains for a random time having 
cumulative distribution function G and mean v > p, and retires. Once the job is vacated, 
another person is selected at random and the process repeats. Assume that the infinite 
population contains the fraction p of competent people and q = 1 — p incompetent ones. 
The folowing two questions can and may be done in any order. 

(a) In the long run, what fraction of time is the position held by an incompetent 
person? 

(b) Establish a renewal equation for A(t) = Pr{a competent person is in the job 
at time t} in terms of p, q, F, G and H(t) = pF(t) + qG(t). 


39. Suppose that X is uniformly distributed on [0, 1], and, given X,,, suppose that X „+ ı 
is uniformly distributed on [0, X,,]. Show that Z, = 2"X,, is a nonnegative martingale and 
find its limit Z,. = lim,..., Zn- 
40. A process {X(t), t > 0} at time t has value either +t or — t. Given that X(t) = +t, 
a jump to —t occurs in the infinitesimal interval dt with probability p(t) dt. Specify p(t) 
for t > Oso that {X(t), t > 0} is a martingale. 


41. Let X,, X,,...be independent random variables sharing the exponential distribution 
in which Pr{X, > t} = e`, t > 0. A record value is an observation X, that exceeds all 
previous observations and its index k is called a record mark. Let K(1), K(2), ... be the 
successive record marks and Z,, Z,,... be the record values. That is, beginning with 
K(1) = land Z, = X,, we have Z, = X kim where 


K(n) = mintk: X, > Xka- pb 


(a) Determine the mean value function m, = ETZ,,]. 
(b) Determine the mean value function y, = EIK) |. 
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42. A single fiber subjected to the time varying tensile load I(t) fails at a random time T. 
We postulate the failure time distribution ` 


Pr{T <t =1- apf- f kuo ash 


which corresponds to the failure rate or hazard rate of 


r(t) = KUO]. 
That is, a single fiber not having failed prior to time t and carrying load I(t) will fail during 
the interval [t, t + At) with probability K[/(t)] At + o(At). 

The function K, called the breakdown rule, expresses how changes in the load affect 
the failure probability. We will be concerned with the power law breakdown rule in which 
K() = IF, for some fixed positive 8. Under a constant load I(t) = |, the failure time is ex- 
ponentially distributed with mean E[T|/] = 1/K(l) = 1~*. Under the assumed power law 
breakdown, a plot of mean failure time versus load is linear on log-log axes, a commonly 
observed phenomenon in fatigue studies. 

Now place n of these fibers in parallel and subject the resulting bundle to a total load, 
constant in time, of nL, where L is the nominal load per fiber. Assuming that all nonfailed 
fibers share the total load nL equally, model the number N(t), of unfailed fibers at time t, 
as a pure death process. Specify the death parameters uy. Since the fibers are in parallel, the 
bundle failure time T, equals the failure time of the last fiber. Express E[T,,] as a sum whose 
terms depend on L and $. Determine lim„> » E[T,,] when £ > 1. 


Answer: py = k(nL/k}, k = 1,...,n. 


E(T,] = 5 (~) =L 5 o hg LF f -ia z iL 
k=1 \Hn k=1\n n o B ` 

43. A “kout of n; F” repairable system consists of n repairable units operating in parallel 

which are serviced by a single repairman. System failure occurs when k units are simul- 

taneously inoperable. Each unit fails with a constant failure rate A so that failure times are 

exponentially distributed. Repair times follow an arbitrary distribution F. Determine 

the mean time to first system failure when k = 2. 


Solution: Let A be the mean failure time, T, the time of the first failure, T, + T, the time 
of the second failure, and R the first repair time. Then, conditioning on the first fail-repair 
cycle, we obtain 
A = E[T,] + Elmin{T,, R}] + A Pr{R < T,}, 
and 
E[T,] + E[min{T,, R}] 
Pr{R > T;} 


A 


aS + f e7" DA — F(t)] dt 
na å 


fu — F(t)|(n — 1)de "7 dt 


0 


l p* 
a + F°[(n — 1)A] 


= n — DAF "fn — DAT 
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where 
F*(s) = [rest — F(t) dt. 
0 


44. Let {B(t)} be a standard Brownian motion. For t > 0 and — œ < x < x, find the 
Markov time T* that maximizes E[r(x + B(T), t + T)] where r(x, t) = (x — 1)! = 
max{0, x — t}. 


Answer: T* = inf{s > 0, X(s)— s 2 4 + t — x} and 
=i for x—t2>4, 
v(x, t) = sup E[r(x + B(T), t + T)] = ae er Rik 2k 


45. Let {B(t)} be a standard Brownian motion. For fixed a, A>0, ~œ <x < œ, 
find the Markov time T that maximizes E[e~*7{x + B(T)}*]. 


Answer: T* = inf{t > 0,x + B(t) > a/,/2A} and 
x* for x> at/./ 2A, 
v(x) = sup E[e~*7{x + B(T)}*] = ( a 


oes for x< af 2A. 
a 


INDEX 


A Brownian motion, 169-170, 211, 323, 377, 
379, 385, 386, 388, 389, 394 
Absorbing barrier, 16-18, 19 absorbed, 170, 392 
Absorbing boundary, 251 arcsine law, 224-226 
Absorbing state, 24, 29, 155 backward equation, 217 
Additive functional, 254-255, 308-313 boundary behavior, 228-229, 230, 236 
Arcsine law, 224-226, 473 


; compounded, 378 
Attainable boundary, 230 conditioned process, 267-272, 388 
Attracting boundary, 228, 249 control problem, 212 


with drift, 205 


B geometric, 175, 359-360, 380 
infinitesimal operator, 289, 297 
Backward equation instantaneous return process, 261 
for Brownian motion, 217 n-dimensional, 291-292, 299, 312-313, 380, 
for diffusion processes, 214-216 382, 393 
for Kac functional, 222-224 radial process, 335-336 
for Markov chain, 143-144 reflected, 170, 327, 337, 379, 392 
Ballot problem, 107, 134-136 resolvent, 288 
Bessel process, 175-176, 327, 336 spectral representation, 337, 393 
boundary behavior, 238-239 standard, 197, 205 
spectral representation, 338 and white noise, 343 
Birth and death process, 142 Busy period, 513-519 


Boundary behavior, and infinitesimal 
operator, 305-307 


Boundary classifications, 234, 250 c 
Boundary criteria, 232-234 
Branching process, 383, 384 Cash inventory model, 211-212 


Brownian bridge, 268-269 Chapman-Kolmogorov equation, 286 
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INDEX 


Compound Poisson process, 426-440 
decomposition of, 433-436 
sum of, 430-431 

Conditional diffusion process, 261-272, 387 
boundary behavior, 263-264 
Green function, 264-265 

Conservative diffusion, 161 

Conservative process, 143 

Convergence to diffusions, 168 

Coupling, 93 


D 


Differential operator, 388 

Diffusion coefficient, 159 

Drift coefficient, 159 

Dynamical systems, 343-345 

Dynkin condition, 163, 165 

Dynkin formula, 297-299, 308-313 
applications, 299 


> E 


Ehrenfest urn model, 171 
Elastic boundary, 255-257 
Empirical distribution, 113-116, 119-123 
Entrance boundary, 235, 246, 382 

stationary distribution, 241-242 
Excessive function, see Superregular sequence 
Exchangeable random variables, 454, 516 
Exit boundary, 233-234, 246 
Exit time, 302-303 


Feller property, 291 
Forward equation 
for diffusion processes, 219 
for Markov chain, 143, 144 
Fundamental matrix, 24 


G 


Gene frequency model, 177, 206-208, 
361-362, 381, 387, 390 
boundary behavior, 239-241 
with mutation, 177-179, 180-183, 208-211, 
239-241, 265-266 
with mutation and selection, 222 
with selection, 180, 184-188 
spectral representation, 336-337 
Genetic recombination, 272-284 


. Green function, 198-202, 287 


for Brownian motion, 205 
and resolvent operator, 293 


H 


Harmonic function, see Regular sequence 
Hewitt-Savage 0-1 law, 477 
Hille-Y osida theorem, 296 
Hitting time, 158, 162, 192, 226 
of boundary, 228, 242-243 


I 


Infinitesimal generator, 286 
Infinitesimal matrix, 145 
Infinitesimal mean, 159 
Infinitesimal operator, 195, 287, 294-295 
and boundary behavior, 305-307 
as differential operator, 303-305 
domain of, 307-308 
Dynkin form, 300-302 
Infinitesimal parameters, 159 
Infinitesimal variance, 159 
Input distribution, 489 
Instantaneous return process, 260-261 
Instantaneous state, 147 
Interchangeable, 108-113 
Ito integral, 346-347, 356-357 
Ito transformation formula, 173, 347-348, 
371-372, 389 
applications, 374-375 


Jacobi diffusion, 335 
Jump boundary, 258-260 


K 


Kac functional, 222-224, 313-316, 393 
Killing rate, 204, 313-316 
Killing time, 161, 382 
example, 272-284 
Kolmogorov condition, 165 


L 


Ladder index, 95 

Ladder random variable, 464-465, 480 
Levy process, 319, 432-433 

Lightning model, 446 


INDEX 
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Local time, 198, 207, 251-253, 316-324 
inverse of, 317-318 
Logistic equation, 360-361 


M 


Markov time, 149-150 
Martingale, 167, 308-312, 325-327, 371, 
376-377, 382 
and stochastic integrals, 352-355 
Maximum random variable, 451, 463, 
483-484, 486 
Minimal process, 148 
Multiplicative functional, 313-316 
Mutation, see also Gene frequency model, 
in population, 188-191 


N 


Natural boundary, 235-236, 246 
Natural scale, 196 


o 


Occupation time, 252, 452 
Optimal allocation model, 367-368 
Optimal growth model, 366-367 
Optimal stopping, 50-64, 68-69 
Option price model, 365-366, 389 
Order statistics, 100-107, 124, 125, 129 
Ornstein-Uhlenbeck process, 170-173, 183, 
345-346, 379, 380 

backward equation, 218 

boundary behavior, 237 

in n dimensions, 292 

radial process, 333-334 

spectral representation, 332-333 

stationary distribution, 221 


P 


Periodic class, 6-10, 78 
Poisson point process, see also Poisson 
process 
spatial, 436-440 
Poisson process 
in astronomy, 404-405 
spatial, 398-403, 441, 443, 445 
and uniform distribution, 403-404 
Population growth model, 188-191, 354-355, 
358-361, 378, 382, 383, 390 
with age structure, 419-426 


boundary behavior, 239 

geometric, 413-416 

with immigration, 405-408 

with mutation, 408-413 

in space and time, 416-419 

spectral representation, 334 

two types, 444 
Production and consumption model, 363-365 
Progressively measurable, 369 


Q 


Quasi-left continuity, 163 
Queueing process, 394 
with balking, 520-521 
embedded Markov chain, 497-503 
Ex/M/1, 506-510 
G/M/1, 504-506 
G/M/”, 522 
GI/G/1, 524 
GI/GI/1, 492-496 
GI/M/s, 511-513 
M/G/1, 523 
M/G/*, 449, 521, 522 
M/GI/1, 497-503 
M/M/1, 490-492, 519, 520, 524 
M/M/s, 519 
M/M/™, 519-520 
notation, 490 
Queue discipline, 489 
Quiz show, 56-58 


R 


Radial Brownian motion, see Bessel process 

Random time change, 253-255 

Random walk, 25, 26, 67, 82, 97, 98, 99 
optimal stopping, 58-59 
spectral representation for, 10-23 

Recurrence, 72-76, 83, 153, 155 
criterion for, 73 

Recurrent class, 9, 23, 35 

Recurrent Markov chain, 3-4, 65, 66 
criterion for, 37-40, 47, 49 

Recurrent state, 34 

Reflecting barrier, 14-16 

Reflecting boundary, 251 

Regular boundary, 232-233 

Regular diffusion, 158 

Regular sequence, 44-45, 83 

Renewal theorem, 93-95 

Renewal theory, 482 
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Resolvent operator, 287, 292-294 
Reversed process, 42, 69 


S 


Scale function, 194, 226 
Scale measure, 227 
Semigroup theory, 305 
for Markov processes, 285-294 
Service distribution, 489 
Shock model, 429-430 
Signal detection model, 368 
Spectral representation 
for absorbed Brownian motion, 393 
Jacobi diffusion, 335 
Markov chain, 5, 23 
matrix, 1-3 
Ornstein-Uhlenbeck process, 332-333 
population growth model, 334 
random walk, 14-18 
Speed density, 195, 197 
Speed measure, 195, 227 
Spitzer’s identity, 460, 472 
Stable state, 146 
Standard process, 162 
Standard transition function, 138 
Stationary distribution, 220-221, 236, 
241-242, 386 
generalized, 37, 42-43 
Sticky boundary, 257-258 
Stochastic differential equation, solution of, 
373-374 
Stopping rule problems, see optimal stopping 
Stratonovich integral, 346-347, 351, 353-354, 
356-358, 390-391 


INDEX 


Strength model, 439-440 

Strong Markov property, 149-152 

Subadditive function, 139 

Subregular sequence, 44-45 

Superregular functions, transient Markov 
chains, 47 

Superregular majorant, 56, 59-62 

Superregular sequence, 44-45, 52-53, 66 


T 


Taboo probabilities, 31-32 
Total positivity, 167 
Transience, 75 

Transient class, 24 
Transient Markov chain, 67 
Trap state, 378 


U 


Unattainable boundary, 230 


v 


Virtual waiting time, 513-519 


w 


Warrant price model, see Option price model 

White noise, 342-343 

Wong-Zakai integral, 348-351 

Wright-Fisher model, see Gene frequency 
model 

Wronskian, 200 
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